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(57) Abstract 

A method for selecting recombinant host cells expressing high levels of a desired protein is described. This method utilizes eukaryotic 
host cells harboring a DNA construct comprising a selectable gene (preferably an amplinable gene) and a product gene provided 3' to 
the selectable gene. The selectable gene is positioned within an intron defined by a splice donor site and a splice acceptor site and the 
selectable gene and product gene are under the transcriptional control of a single transcriptional regulatory region. The splice donor site 
is generally an efficient splice donor site and thereby regulates expression of the product gene using the transcriptional regulatory region. 
The transfected cells are cultured so as to express the gene encoding the product in a selective medium comprising an amplifying agent 
for sufficient time to allow amplification to occur, whereupon either the desired product is recovered or cells having multiple copies of the 
product gene are identified. 
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METHOD FOR SELECTING HIGH- EXPRESSING HOST CELLS 
BACKGROUND OF THE INVENTION 

Field of the Invention 

This invention relates to a method of selecting for high- expressing 
5 host cells, a method of producing a protein of interest in high yields and 
a method of producing eukaryotic cells having multiple copies of a sequence 
encoding a protein of interest. 

Description of Background and Related Art 

The discovery of methods for introducing DNA into living host cells 

10 in a functional form has provided the key to understanding many fundamental 
biological processes, and has made possible the production of important 
proteins and other molecules in commercially useful quantities . 

Despite the general success of such gene transfer methods, several 
common problems exist that may limit the efficiency with which a gene 

15 encoding a desired protein can be introduced into and expressed in a host 
cell. One problem is knowing when the gene has been successfully 
transferred into recipient cells. A second problem is distinguishing 
between those cells that contain the gene and those that have survived the 
transfer procedures but do not contain the gene. A third problem is 

20 identifying and isolating those cells that contain the gene and that are 
expressing high levels of the protein encoded by the gene. 

In general, the known methods for introducing genes into eukaryotic 
cells tend to be highly inefficient. Of the cells in a given culture, only 
a small proportion take up and express exogenously added DNA, and an even 

2 5 smaller proportion stably maintain that DNA. 

Identification of those cells that have incorporated a product gene 
encoding a desired protein typically is achieved by introducing into the 
same cells another gene, commonly referred to as a selectable gene, that 
encodes a selectable marker. A selectable marker is a protein that is 

30 necessary for the growth or survival of a host cell under the particular 
culture conditions chosen, such as an enzyme that confers resistance to an 
antibiotic or other drug, or an enzyme that compensates for a metabolic or 
catabolic defect in the host cell. For example, selectable genes commonly 
used with eukaryotic cells include the genes for aminoglycoside 

35 phosphotransferase (APH) , hygromycin phosphotransferase (hyg) , 

dihydrof olate reductase (DHFR) , thymidine kinase (tk) , neomycin, puromycin, 
glutamme synthetase, and asparagine synthetase . 

The method of identifying a host cell that has incorporated one gene 
on the basis of expression by the host cell of a second incorporated gene 

40 encoding a selectable marker is referred to as cotransf ectat ion (or 
cotransf ection) . In that method, a gene encoding a desired polypeptide and 
a selection gene typically are introduced into the host cell 
simultaneously, although they may be introduced sequentially. In the case 
of simultaneous cotransf ectat ion , the gene encoding the desired polypeptide 
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and the selectable gene may be present on a single DNA molecule or on 
separate DNA molecules prior to being introduced into the host cells . 
Wigler et al . , Cell , 16:777 {1979) . Cells that have incorporated the gene 
encoding the desired polypeptide then are identified or isolated by 
5 culturing the cells under conditions that preferentially allow for the 
growth or survival of those cells that synthesize the selectable marker 
encoded by the selectable gene . 

The level of expression of a gene introduced into a eukaryotic host 
cell depends on multiple factors, including gene copy number, efficiency 

10 of transcription, messenger RNA (mRNA) processing, stability, and 
translation efficiency. Accordingly, high level expression of a desired 
polypeptide typically will involve optimizing one or more of those factors . 

For example, the level of protein production may be increased by 
covalently joining the coding sequence of the gene to a "strong" promoter 

15 or enhancer that will give high levels of transcription. Promoters and 
enhancers are nucleotide sequences that interact specifically with proteins 
in a host cell that are involved in transcription. Knegler, Meth . 
Enzvmol . , 185:512 (1990); Maniatis et al . , Science , 236:1237 (1987). 
Promoters are located upstream of the coding sequence of a gene and 

20 facilitate transcription of the gene by RNA polymerase. Among the 
eukaryotic promoters that have been identified as strong promoters for 
high-level expression are the SV40 early promoter, adenovirus major late 
promoter, mouse metallothionein- I promoter, Rous sarcoma virus long 
terminal repeat, and human cytomegalovirus immediate early promoter (CMV) . 

25 Enhancers stimulate transcription from a linked promoter. Unlike 

promoters, enhancers are active when placed downstream from the 
transcription initiation site or at considerable distances from the 
promoter, although in practice enhancers may overlap physically and 
functionally with promoters. For example, all of the strong promoters 

30 listed above also contain strong enhancers. Bendig, Genetic Encr ineering , 
7:91 (Academic Press, 1988) . 

The level of protein production also may be increased by increasing 
the gene copy number in the host cell . One method for obtaining high gene 
copy number is to directly introduce into the host cell multiple copies of 

3 5 the gene, for example, by using a large molar excess of the product gene 

relative to the selectable gene during cotransf ectation . Kaufman, Meth . 
Enzvmol . , 185:537 (1990). With this method, however, only a small 
proportion of the cotransf ected cells will contain the product gene at high 
copy number. Furthermore, because no generally applicable, convenient 
40 method exists for distinguishing such cells from the ma]ority of cells that 
contain fewer copies of the product gene, laborious and time-consuming 
screening methods typically are required to identify the desired high-copy 
number transf ectants . 

Another method for obtaining high gene copy number involves cloning 

4 5 the gene in a vector that is capable of replicating autonomously in the 

host cell. Examples of such vectors include mammalian expression vectors 
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derived from Epstein - Barr virus or bovine papilloma virus, and yeast 2- 
micron plasmid vectors. Stephens & Hentschel, Biochem. J . , 248:1 (1987); 
Yates ec al . , Nature , 313:812 (1985); Beggs , Genetic Engineering, 2:175 
(Academic Press, 1981) . 
5 Yet another method for obtaining high gene copy number involves gene 

amplification in the host cell. Gene amplification occurs naturally in 

eukaryotic cells at a relatively low frequency. Schimke, Biol . Chem . , 

263:5989 (1988). However, gene amplification also may be induced, or at 
least selected for, by exposing host cells to appropriate selective 

10 pressure. For example, in many cases it is possible to introduce a product 
gene together with an amplifiable gene into a host cell and subsequently 
select for amplification of the marker gene by exposing the cotransf ected 
cells to sequentially increasing concentrations of a selective agent. 
Typically the product gene will be coamplif ied with the marker gene under 

15 such conditions. 

The most widely used amplifiable gene for that purpose is a DHFR 
gene, which encodes a dihydrof olate reductase enzyme. The selection agent 
used in conjunction with a DHFR gene is methotrexate (Mtx) . A host cell 
is cotransf ected with a product gene encoding a desired protein and a DHFR 

20 gene, and trans fectants are identified by first culturmg the cells in 
culture medium that contains Mtx. A suitable host cell when a wild- type 
DHFR gene is used is the Chinese Hamster Ovary (CHO) cell line deficient 
in DHFR activity, prepared and propagated as described by Urlaub & Chasin, 
Proc. Nat. Acad. Sci. USA , 77:4216 (1980). The transfected cells then are 

25 exposed to successively higher amounts of Mtx. This leads to the synthesis 
of multiple copies of the DHFR gene, and concomitantly, multiple copies of 
the product gene. Schimke, J. Biol . Chem. , 263:5989 (1988); Axel ec al . , 
U.S. Patent No. 4,399,216; Axel et al . , U.S. Patent No. 4,634,665. Other 
references directed to co- transf ection of a gene together with a genetic 

30 marker that allows for selection and subsequent amplification include 
Kaufman in Genetic Engineering , ed. J- Setlow (Plenum Press, New York), 
Vol. 9 (1987); Kaufman and Sharp, J. Mol . Biol. , 159:601 (1982); Rmgold 
et al., J. Mol . AddI . Genet . , 1:165-175 (1981); Kaufman ec al . , Mol. Cell 
Biol . , 5:1750-1759 (1985); Kaetzel and Nilson, J. Biol . Chem. , 263:6244- 

35 6251 (1988); Hung et al . , Proc. Natl. Acad. Sci. USA , 83:261-264 (1986); 
Kaufman et al . , EMBO J . , 6:87-93 (1987); Johnston and Kucey, Science, 
242:1551-15 54 (1988); Urlaub et al . , Cell , 33:405-412 (1983). 

To extend the DHFR amplification method to other cell types, a mutant 
DHFR gene that encodes a protein with reduced sensitivity to methotrexate 

40 may be used in conjunction with host cells that contain normal numbers of 
an endogenous wild- type DHFR gene. Simonsen and Levinson, Proc . Natl . 
Acad. Sci . USA , 80:2495 (1983); Wigler et al . , Proc. Natl. Acad. Sci. USA , 
77:3567-3570 (1980); Haber and Schimke, Somatic Cell Genetics , 8:499-508 
(1982) . 

45 Alternatively, host cells may be co - trans f ected with the product 

gene, a DHFR gene, and a dominant selectable gene, such as a neo r gene. Kim 
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and Wold, Cell , 42:129 (1985); Capon et al . , U.S. Pat. No. 4,965,199. 
Transf ectants are identified by first culturing the cells in culture medium 
containing neomycin (or the related drug G418) , and the transf ectants so 
identified then are selected for amplification of the DHFR gene and the 
5 product gene by exposure to successively increasing amounts of Mtx . 

As will be appreciated from this discussion, the selection of 
recombinant host cells that express high levels of a desired protein 
generally is a multi-step process. In the first step, initial 

transf ectants are selected that have incorporated the product gene and the 

10 selectable gene. In subsequent steps, the initial transf ectants are 
subject to further selection for high-level expression of the selectable 
gene and then random screening for high-level e..pression of the product 
gene. To identify cells expressing high levels of the desired protein, 
typically one must screen large numbers of transf ectants . The majority of 

15 transf ectants produce less than maximal levels of the desired protein. 
Further, Mtx resistance in DHFR transf ormants is at least partially 
conferred by varying degrees of gene amplification. Schimke, Cell , 37:705- 
713 (1984) . The inadequacies of co-expression of the non-selected gene 
have been reported by Wold et aJ., Proc. Natl. Acad. Sci . USA , 76:5684-5688 

20 (1979) . Instability of the amplified DNA is reported by Kaufman and 
Schimke, Mol . Cell Biol . , 1:1069- 1076 (1961); Haber and Schimke, Cell , 
26:355-362 (1981); and Fedespiel et al . , J . Biol . Chem. , 259:912 7-914 0 
(1984) . 

Several methods have been described for directly selecting such 

25 recombinant host cells in a single step. One strategy involves co- 
transfecting host cells with a product gene and a DHFR gene, and selecting 
those cells that express high levels of DHFR by directly culturing in 
medium containing a high concentration of Mtx. Many of the cells selected 
in that manner also express the co- transf ected product gene at high levels. 

30 Page and Sydenham, Bio /Technology , 9:64 (1991) . This method for single-step 
selection suffers from certain drawbacks that limit its usefulness. High- 
expressing cells obtained by direct culturing in medium containing a high 
level of a selection agent may have poor growth and stability 
characteristics, thus limiting their usefulness for long-term production 

35 processes. Page and Snyderman, Bio /Technology , 9:64 (1991) . Single-step 
selection for high-level resistance to Mtx may produce cells with an 
altered, Mtx- resistant DHFR enzyme, or cells that have altered Mtx 
transport properties, rather than cells containing amplified genes . Haber 
et al . , J. Biol . Chem. , 256:9501 (1981); Assaraf and Schimke, Proc. Natl. 

40 Acad. Sci. USA , 84:7154 (1987). 

Another method involves the use of polycistronic mRKA expression 
vectors containing a product gene at the 5' end of the transcribed region 
and a selectable gene at the 3' end. Because translation of the selectable 
gene at the 3' end of the polycistronic mRKA is inefficient, such vectors 

45 exhibit preferential translation of the product gene and require high 
levels of polycistronic mRNA to survive selection. Kaufman, Meth . 
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Enzvmol . , 185:487 (1990); Kaufman, Meth. Enzvmol . , 185:537 (1990); Kaufman 
et al., EMBO J . , 6:187 (1987). Accordingly, cells expressing high levels 
of the desired protein product may be obtained in a single step by 
culturing the initial transf ectant s in medium containing a selection agent 
5 appropriate for use with the particular selectable gene. However, the 
utility of these vectors is variable because of the unpredictable influence 
of the upstream product reading frame on selectable marker translation and 
because the upstream reading frame sometimes becomes deleted during 
methotrexate amplification (Kaufman et al . , J . Mol . Biol . . 159:601-621 

10 [1982]; Levinson, Methods in Enzymology , San Diego: Academic Press, Inc. 
[1990] ) . Later vectors incorporated an internal translation initiation site 
derived from members of the picornavirus family which is positioned between 
the product gene and the selectable gene (Pelletier et al . , Nature , 334:320 
[1988] ; Jang et al . , J. Virol. , 63 : 16 51 [198 9] ) . 

15 A third method for single-step selection involves use of a DNA 

construct with a selectable gene containing an intron within which is 
located a gene encoding the protein of interest. See U.S. Patent No. 
5,043,270 and Abrams et al . , J. Biol . Chem . , 264(24): 14016-14021 (1989). 
In yet another single-step selection method, host cells are co- trans fected 

20 with an intron-modif ied selectable gene and a gene encoding the protein of 
interest. See WO 92/17566, published October 15, 1992. The intron- 
modified gene is prepared by inserting into the transcribed region of a 
selectable gene an intron of such length that the intron is correctly 
spliced from the corresponding mRNA precursor at low efficiency, so that 

2 5 the amount of selectable marker produced from the intron-modif ied 
selectable gene is substantially less than that produced from the starting 
selectable gene. These vectors help to insure the integrity of the 
integrated DNA construct, but transcriptional linkage is not achieved as 
selectable gene and the protein gene are driven by separate promoters. 

30 Other mammalian expression vectors that have single transcription 

units have been described. Retroviral vectors have been constructed (Cepko 
et al., Cell , 37:1053-1062 [1984]) in which a cDNA is inserted between the 
endogenous Moloney murine leukemia virus (M-MuLV) splice donor and splice 
acceptor sites which are followed by a neomycin resistance gene. This 

35 vector has been used to express a variety of gene products following 
retroviral infection of several cell types. 

With the above drawbacks in mind, it is one object of the present 
invention to increase the level of homogeneity with regard to expression 
levels of stable clones transf ected with a product gene of interest, by 

40 expressing a selectable marker (DHFR) and the protein of interest from a 
single promoter. 

It is another object to provide a method for selecting stable, 
recombinant host cells that express high levels of a desired protein 
product, which method is rapid and convenient to perform, and reduces the 
45 numbers of transf ected cells which need to be screened. Furthermore, it is 

-5- 
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an object to allow high levels of single and two unit polypeptides to be 
rapidly generated from clones or pools of stable host cell transf ectants . 

It is an additional object to provide expression vectors which bias 
for active integration events (i.e. have an increased tendency to generate 
5 transf ormants wherein the DNA construct is inserted into a region of the 
genome of the host cell which results in high level expression of the 
product gene) and can accommodate a variety of product genes without the 
need for modification. 

10 SUMMARY OF THE INVENTION 

Accordingly, the present invention is directed to a DNA construct 
(DNA molecule) alternative terminology comprising a 5' transcriptional 
initiation site and a 3' transcriptional termination site, a selectable 
gene (preferably an amplifiable gene) and a product gene provided 3' to the 

15 selectable gene, a transcriptional regulatory region regulating 
transcription of both the selectable gene and the product gene, the 
selectable gene positioned within an intron defined by a splice donor site 
and a splice acceptor site. The splice donor site preferably comprises an 
effective splice donor sequence as herein defined and thereby regulates 

20 expression of the product gene using the transcript lonal regulatory region . 

In another embodiment, the invention provides a method for producing 
a product of interest comprising culturing a eukaryotic cell which has been 
transf ected with the DNA construct described above, so as to express the 
product gene and recovering the product. 

25 In a further embodiment, the invention provides a method for 

producing eukaryotic cells having multiple copies of the product gene 
comprising transfecting eukaryotic cells with the DNA construct described 
above (where the selectable gene is an amplifiable gene) , growing the cells 
in a selective medium comprising an amplifying agent for a sufficient time 

30 for amplification to occur, and selecting cells having multiple copies of 
the product gene. Preferably transf ection of the cells is achieved using 
electroporat ion . 

After transf ection of the host cells, most of the transf ectants fail 
to exhibit the selectable phenotype characteristic of the protein encoded 

3 5 by the selectable gene, but surprisingly a small proportion of the 
transf ectants do exhibit the selectable phenotype, and among those 
trans f ectants , the majority are found to express high levels of the desired 
product encoded by the product gene. Thus, the invention provides an 
improved method for the selection of recombinant host cells expressing high 

40 levels of a desired product, which method is useful with a wide variety of 
eukaryotic host cells and avoids the problems inherent in existing cell 
selection technology. 
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BRIEF DESCRIPTION OF THE DRAWINGS 
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15 



20 



25 



30 



35 



40 



Figures 1A-1D illustrate schematically various DNA constructs 
encompassed by the instant invention. The large arrows represent the 
selectable gene and the product gene, the V formed by the dashed lines 
shows the region of the precursor RNA internal to the 5' splice donor site 
(SD) and 3' splice acceptor site (SA) that is excised from vectors that 
contain a functional SD. The transcriptional regulatory region, selectable 
gene, product gene and transcriptional termination site are depicted in 
Figure 1A. Figure IB depicts the DNA constructs of Example 1, The various 
splice donor sequences are depicted, i.e., wild type ras splice donor 
sequence (WT ras) , mutant ras splice donor sequence (MUTANT ras) and non- 
functional splice donor sequence (aGT) . The probes used for Northern blot 
analysis in Example 1 are shown in Figure IB. Figure 1C depicts the DNA 
constructs of Example 2 and Figure ID depicts the DNA construct of Example 
3 used for expression of anti-IgE V H . 

Figure 2 depicts schematically the control DNA construct used in 
Example 1 . 

Figures 3A-Q depict the nucleotide sequence ( SEQ ID NO: 1) of the 
DHFR/intron- (WT ras SD) -tPA expression vector of Example 1. 

Figure 4 is a bar graph which shows the number of colonies that form 
in selective medium after electroporat ion of linearized duplicate miniprep 
DNA' s prepared in parallel from the three vectors shown in Figure IB (i.e. 
with wild type ras splice donor sequence [WT ras] , mutant ras splice donor 
sequence [MUTANT ras] and non- functional splice donor sequence [aGT] ) and 
from the control vector that has DHFR under control of SV40 promoter and 
tPA under control of CMV promoter (see Figure 2) . Cells were selected in 
nucleoside free medium and counted with an automated colony counter. 

Figures 5A-C are bar graphs depicting expression of tPA from stable 
pools and clones generated from the vectors shown in Figure IB. In Figure 
5A greater than 100 clones from each vector transf ection were mixed, plated 
in 24 well plates, and assayed by tPA ELISA at ''saturation" . In Figure SB, 
twenty clones chosen at random derived from each of the vectors were 
assayed by tPA ELISA at "saturation". In Figure 5C, the pools mentioned in 
Figure 5A (except the aGT pool) were exposed to 200nM Mtx to select for 
DHFR amplification and then pooled and assayed for tPA expression. 

Figures 6A-P depict the nucleotide sequence (SEQ ID NO: 2) of the 
DHFR/intron- (WT ras SD) - TNFr-IgG expression vector of Example 2. 

Figures 7A-B are bar graphs depicting expression of TNFr-IgG using 
dicistronic or control vectors (see Example 2) . Vectors containing TNFr- 
IgG (but otherwise identical to those described for tPA expression in 
Example 1) were constructed (see Figure 1C) , introduced into dp!2.CH0 cells 
by electroporation , pooled, and assayed for product expression before 
(Figure 7A) and after (Figure 7B) being subjected to amplification in 200nM 
Mtx. 

Figure 8 depicts schematically the DNA construct used for expression 
of the V L of anti-IgE in Example 3. 



-7- 
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Figures 9A-0 depict the nucleotide sequence (SEQ ID NO: 3) of the 
anti-IgE V M expression vector of Example 3. 

Figures 10A-Q depict the nucleotide sequence { SEQ ID NO: 4) of the 
anti-IgE V L expression vector of Example 3. 
5 Figure 11 is a bar graph depicting anti-IgE expression in Example 3. 

Heavy (V H ) and light {V.J chain expression vectors were constructed, co- 
electroporated into CHO cells, clones were selected and assayed for 
antibody expression. Additionally, pools were established and assessed 
with regard to expression before and after Mtx selection at 200nM and l^M. 

10 DESCRIPTION OF THE PREFERRED EMBODIMENTS 

Definitions : 

The " DNA construct" disclosed herein comprises a non-naturally 
occurring DNA molecule which can either be provided as an isolate or 
integrated in another DNA molecule e.g. in an expression vector or the 

15 chromosome of an eukaryotic host cell. 

The term "selectable gene" as used herein refers to a DNA that 
encodes a selectable marker necessary for the growth or survival of a host 
cell under the particular cell culture conditions chosen. Accordingly, a 
host cell that is transformed with a selectable gene will be capable of 

2 0 growth or survival under certain cell culture conditions wherein a non- 
transfected host cell is not capable of growth or survival. Typically, a 
selectable gene will confer resistance to a drug or compensate for a 
metabolic or catabolic defect in the host cell. Examples of selectable 
genes are provided in the following table. See also Kaufman, Methods in 

25 Enzymology , 185: 537-566 (T99C), for a review of these. 



TAB LB 1 

Selectable Genes and their Selection Agents 



Selection Agent 


Selectable Gene 


Methotrexate 


Dihydrof olate reductase 


Cadmium 


Metal lothionein 


PALA 


CAD 


Xyl-A-or adenosine and 2'- 
deoxycof ormycin 


Adenosine deaminase 


Adenine, azaserine, and 
cof ormycin 


Adenylate deaminase 


6 - Azauridine , pyrazofuran 


UMP Synthetase 


Mycophenolic acid 


IMP 5 ' -dehydrogenase 
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Mycophenolic acid with 
limiting xanthine 


Xanthine -guanine 

■»-> \^ y\ c» *^ /"^ t* -i \-n/""i orrl f f anc f A Y"* SX <2 0 

pnospno ricosyi u. zra.uo tciadc 


Hypoxanthine , aminopterin, 
and thymidine (HAT) 


Mutant HGPRTase or mutant 
thymidine kinase 


5-Fluorodeoxyuridine 


Thymidylate synthetase 


Multiple drugs e.g. 
adriamycin, vincristine or 
colchicine 


P-glycoprotein 170 


Aphidicol in 


Ribonucleotide reductase 


Methionine sulfoximine 


Glutamine synthetase 


/3-Aspartyl hydroxamate or 
Albizziin 


Asparagine synthetase 


Canavanine 


Arginosuccinate synthetase 


cy-Di f 1 uor ome t hy 1 orni t hine 


urniLiiinc acCarDoxyiase 


Compactin 1 


ni ^_CJ/-i I cuul. Last: 


i. Ui-i 1 X v — cx 1 1 1 y L 1 1 


N- Acetylglucosaminyl 
transferase 


Borrelidin 


Threonyl - tRKA synthetase 


Ouabain 


Na'K* - ATPase 



The preferred selectable gene is an amplif iable gene . As used herein, 

20 the term " ampl if iable gene" refers to a gene which is amplified (i.e. 
additional copies of the gene are generated which survive in 
mtrachromosomal or extrachromosomal form) under certain conditions. The 
amplif iable gene usually encodes an enzyme (i.e. an amplif iable marker) 
which is required for growth of eukaryotic cells under those conditions . 

2 5 For example, the gene may encode DHFR which is amplified when a host cell 
transformed therewith is grown in Mtx . According to Kaufman, the selectable 
genes in Table 1 above can also be considered ampl if iable genes . An example 
of a selectable gene which is generally not considered to be an amplif iable 
gene is the neomycin resistance gene (Cepko et al. ( supra). 

30 As used herein, "selective medium" refers to nutrient solution used 

for growing eukaryotic cells which have the selectable gene and therefore 
includes a "selection agent". Commercially available media such as Ham's 
F10 (Sigma) , Minimal Essential Medium { [MEM] , Sigma) , RPMI-1640 (Sigma) , 
and Dulbecco's Modified Eagle's Medium ( [DMEM] , Sigma) are exemplary 

35 nutrient solutions. In addition, any of the media described in Ham and 
Wallace, Meth . Enz . , 58:44 (1979), Barnes and Sato, Anal . Biochem. , 102:255 
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(1980), U.S. Patent Nos. 4,767,704; 4,657,866; 4,927,762; or 4,560,655; WO 
90/03430; WO 87/00195; U.S. Patent Re. 30,985 ; or U.S. Patent No. 
5,122,469, the disclosures of all of which are incorporated herein by 
reference, may be used as culture media. Any of these media may be 
5 supplemented as necessary with hormones and/or other growth factors (such 
as insulin, transferrin, or epidermal growth factor) , salts (such as sodium 
chloride, calcium, magnesium, and phosphate) , buffers (such as HEPES) , 
nucleosides (such as adenosine and thymidine) , antibiotics (such as 
Gentamycin™ drug} , trace elements (defined as inorganic compounds usually 

10 present at final concentrations in the micromolar range} , and glucose or 
an equivalent energy source. Any other necessary supplements may also be 
included at appropriate concentrations that would be known to those skilled 
in the art. The preferred nutrient solution comprises fetal bovine serum. 

The term "selection agent" refers to a substance that interferes with 

15 the growth or survival of a host cell that is deficient in a particular 
selectable gene . Examples of selection agents are presented in Table 1 
above. The selection agent preferably comprises an "amplifying agent" which 
is defined for purposes herein as an agent for amplifying copies of the 
amplifiable gene, such as Mtx if the amplifiable gene is DHFR . See Table 

20 1 for examples of amplifying agents. 

As used herein, the term "transcriptional initiation site" refers to 
the nucleic acid in the DNA construct corresponding to the first nucleic 
acid incorporated into the primary transcript, i.e., the mRNA precursor, 
which site is generally provided at, or adjacent to, the 5' end of the DNA 

25 construct. 

The term "transcriptional termination site" refers to a sequence of 
DNA, normally represented at the 3' end of the DNA construct, that causes 
RNA polymerase to terminate transcription. 

As used herein, "transcriptional regulatory region" refers to a 

30 region of the DNA construct that regulates transcription of the selectable 
gene and the product gene. The transcriptional regulatory region normally 
refers to a promoter sequence (i.e. a region of DNA involved in binding of 
RNA polymerase to initiate transcription) which can be constitutive or 
inducible and, optionally, an enhancer (i.e. a cis-acting DNA element, 

35 usually from about 10-300 bp, that acts on a promoter to increase its 
transcription) . 

As used herein, "product gene" refers to DNA that encodes a desired 
protein or polypeptide product . Any product gene that is capable of 
expression in a host cell may be used, although the methods of the 

40 invention are particularly suited for obtaining high-level expression of 
a product gene that is not also a selectable or amplifiable gene. 
Accordingly, the protein or polypeptide encoded by a product gene typically 
will be one that is not necessary for the growth or survival of a host cell 
under the particular cell culture conditions chosen. For example, product 

4 5 genes suitably encode a peptide, or may encode a polypeptide sequence of 
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amino acids for which the chain length is sufficient to produce higher 
levels of tertiary and/or quaternary structure . 

Examples of bacterial polypeptides or proteins include, e.g., 
alkaline phosphatase and ^-lactamase. Examples of mammalian polypeptides 
5 or proteins include molecules such as renin; a growth hormone, including 
human growth hormone, and bovine growth hormone; growth hormone releasing 
factor; parathyroid hormone; thyroid stimulating hormone; lipoproteins; 
alpha-l-antitrypsin; insulin A-chain; insulin B-chain; promsulm; follicle 
stimulating hormone; calcitonin; luteinizing hormone; glucagon; clotting 

10 factors such as factor VIIIC, factor IX, tissue factor, and von Willebrands 
factor; anti-clotting factors such as Protein C; atrial natriuretic factor; 
lung surfactant; a plasminogen activator, such as urokinase or human urine 
or tissue- type plasminogen activator (t-PA) ; bombesin; thrombin; 
hemopoietic growth factor; tumor necrosis factor-alpha and -beta; 

15 enkephalinase RANTES (regulated on activation normally T-cell expressed 
and secreted) ; human macrophage inflammatory protein (MIP - 1 - alpha ) ; a serum 
albumin such as human serum albumin; mullerian- inhibiting substance; 
relaxin A-chain; relaxin B-chain; prorelaxin; mouse gonadotropin- associated 
peptide; a microbial protein, such as beta - lactamase ; DNase , inhibm; 

20 activin; vascular endothelial growth factor (VEGF) ; receptors for hormones 
or growth factors; integrin; protein A or D ; rheumatoid factors; a 
neurotrophic factor such as bone-derived neurotrophic factor { BDNF } , 
neurotrophic 3 , -4, -5, or -6 (NT -3 , NT- 4 , NT- 5 , or NT- 6 ) , or a nerve 
growth factor such as NGF - 0 ; platelet -derived growth factor (PDGF) ; 

25 fibroblast growth factor such as aFGF and bFGF; epidermal growth factor 
(EGF) ; transforming growth factor (TGF) such as TGF-alpha and TGF-beta, 
including TGF-01, TGF-02, TGF-03, TGF-04, or TGF-05; insulin-like growth 
factor-I and -II (IGF-I and IGF-II); des ( 1 - 3 ) - IGF- I (brain IGF-I), msulm- 
like growth factor binding proteins; CD proteins such as CD- 3, CD-4, CD- 8, 

3 0 and CD -19; erythropoietin; osteoinductive factors; immunotoxins ; a bone 
morphogenetic protein (BMP) ; an interferon such as inte-rf eron - alpha , -beta, 
and -gamma; colony stimulating factors (CSFs) , e.g., M-CSF, GM-CSF, and G- 
CSF; interleukins (ILs), e.g., IL-1 to IL-10; superoxide dismutase; T-cell 
receptors; surface membrane proteins; decay accelerating factor; viral 

3 5 antigen such as, for example, a portion of the AIDS envelope; transport 

proteins; homing receptors; addressins; regulatory proteins; antibodies ; 
chimeric proteins such as immunoadhesins and fragments of any of the above- 
listed polypeptides. 

The product gene preferably does not consist of an anti-sense 
40 sequence for inhibiting the expression of a gene present in the host. 
Preferred proteins herein are therapeutic proteins such as TGF- 0 , TGF- a , 
PDGF, EGF, FGF, IGF-I, DNase, plasminogen activators such as t-PA, clotting 
factors such as tissue factor and factor VIII, hormones such as relaxin and 
insulin, cytokines such as IFN-7, chimeric proteins such as TNF receptor 

4 5 IgG immunoadhesin (TNFr-IgG) or antibodies such as anti-IgE. 
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The term "mtron" as used herein refers to a nucleotide sequence 
present within the transcribed region of a gene or within a messenger RNA 
precursor, which nucleotide sequence is capable of being excised, or 
spliced, from the messenger RNA precursor by a host cell prior to 
5 translation. Introns suitable for use in the present invention are 
suitably prepared by any of several methods that are well known in the art, 
such as purification from a naturally occurring nucleic acid or de novo 
synthesis. The introns present in many naturally occurring eukaryotic 
genes have been identified and characterized. Mount, Nuc . Acids Res. , 

10 10:459 (1982) . Artificial introns comprising functional splice sites also 
have been described. Winey et al . , Mol . Cell Biol . , 9:329 (1989); 
Gatermann et al . , Mol . Cell Biol . , 9:1526 (1989) . Introns may be obtained 
from naturally occurring nucleic acids, for example, by digestion of a 
naturally occurring nucleic acid with a suitable restriction endonuclease , 

15 or by PCR cloning using primers complementary to sequences at the 5 ' and 
3' ends of the mtron. Alternatively, introns of defined sequence and 
length may be prepared synthetically using various methods in organic 
chemistry. Narang et al . , Meth . Enzymol . , 68:90 (1979); Caruthers et al . , 
Meth . Enzymol . , 154:287 (1985); Froehler et al . , Nuc . Acids Res . , 14:5399 

20 (1986) . 

As used herein "splice donor site" or "SD" refers to the DNA sequence 
immediately surrounding the exon-intron boundary at the 5' end of the 
mtron, where the "exon" comprises the nucleic acid 5' to the intron . Many 
splice donor sites have been characterized and Ohshima et al . , J . Mol . 

25 Biol . , 195:247-259 (1987) provides a review of these. An "efficient splice 
donor sequence" refers to a nucleic acid sequence encoding a splice donor 
site wherein the efficiency of splicing of messenger RNA precursors having 
the splice donor sequence is between about 80 to 99% and preferably 90 to 
95% as determined by quantitative PCR . Examples of efficient splice donor 

30 sequences include the wild type (WT) ras splice donor sequence and the 
GAC : GTAAGT sequence of Example 3. Other efficient splice donor sequences 
can be readily selected using the techniques for measuring the efficiency 
of splicing disclosed herein. 

The terms "PCR" and "polymerase chain reaction" as used herein refer 

35 to the in vitro amplification method described in US Patent No. 4,683,195 
(issued July 28, 1987). In general, the PCR method involves repeated 
cycles of primer extension synthesis, using two DNA primers capable of 
hybridizing preferentially to a template nucleic acid comprising the 
nucleotide sequence to be amplified. The PCR method can be used to clone 

40 specific DNA sequences from total genomic DNA, cDNA transcribed from 
cellular RNA, viral or plasmid DNAs . Wang & Mark, in PCR Protocols , pp. 70- 
75 (Academic Press, 1990); Scharf , in PCR Protocols , pp. 84-98; Kawasaki 
U Wang, in PCR Technology , pp. 89-97 (Stockton Press, 1989). Reverse 
transcription-polymerase chain reaction (RT-PCR) can be used to analyze RNA 

45 samples containing mixtures of spliced and unspliced mRNA transcripts. 
Fluorescently tagged primers designed to span the intron are used to 
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amplify both spliced and unspliced targets. The resultant amplification 
products are then separated by gel electrophoresis and quantitated by 
measuring the fluorescent emission of the appropriate band ( s ) . A 
comparison is made to determine the amount of spliced and unspliced 
5 transcripts present in the RNA sample . 

One preferred splice donor sequence is a "consensus splice donor 
sequence". The nucleotide sequences surrounding intron splice sites, which 
sequences are evolutionari ly highly conserved, are referred to as 
"consensus splice donor sequences". In the mRNAs of higher eukaryotes, the 

10 5' splice site occurs within the consensus sequence AG : GUAAGU (wherein the 
colon denotes the site of cleavage and ligation) . In the mRNAs of yeast, 
the 5' splice site is bounded by the consensus sequence : GUAUGU . Padgett, 
et al., Ann. Rev. Biochem. , 55:1119 (1986) . 

The expression "splice acceptor site" or "SA" refers to the sequence 

15 immediately surrounding the intron-exon boundary at the 3' end of the 
intron, where the "exon" comprises the nucleic acid 3' to the intron. Many 
splice acceptor sites have been characterized and Ohshima et al . , J . Mol . 
Biol . , 195:247-259 (1987) provides a review of these. The preferred splice 
acceptor site is an efficient splice acceptor site which refers to a 

20 nucleic acid sequence encoding a splice acceptor site wherein the 
efficiency of splicing of messenger RNA precursors having the splice 
acceptor site is between about 80 to 99% and preferably 90 to 95% as 
determined by quantitative PCR . The splice acceptor site may comprise a 
consensus sequence. In the mRNAs of higher eukaryotes , the 3' splice 

25 acceptor site occurs within the consensus sequence (U/C) n NCAG:G. In the 
mRNAs of yeast, the 3' acceptor splice site is bounded by the consensus 
sequence (C/U)AG:. Padgett, et al . , supra. 

As used herein "culturing for sufficient time to allow amplification 
to occur" refers to the act of physically culturing the eukaryotic host 

30 cells which have been transformed with the DNA construct in cell culture 
media containing the amplifying agent, until the copy number of the 
amplifiable gene (and preferably also the copy number of the product gene) 
in the host cells has increased relative to the transformed cells prior to 
this culturing. 

35 The term "expression" as used herein refers to transcription or 

translation occurring within a host cell. The level of expression of a 
product gene in a host cell may be determined on the basis of either the 
amount of corresponding mRNA that is present in the cell or the amount of 
the protein encoded by the product gene that is produced by the cell. For 

40 example, mRNA transcribed from a product gene is desirably quantitated by 
northern hybridization. Sambrook, et al . , Molecular Cloning : A Laboratory 
Manual , pp. 7.3-7.57 (Cold Spring Harbor Laboratory Press, 1989; . Protein 
encoded by a product gene can be quantitated either by assaying for the 
biological activity of the protein or by employing assays that are 

45 independent of such activity, such as western blotting or radioimmunoassay 
using antibodies that are capable of reacting with the protein. Sambrook, 
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et al . , Molecular Cloning : A Laboratory Manual , pp. 18.1-18.88 (Cold Spring 
Harbor Laboratory Press, 198 9) . 

Modes for Carrying Out the Invention 

Methods and compositions are provided for enhancing the stability 
5 and/or copy number of a transcribed sequence in order to allow for elevated 
levels of a RNA sequence of interest. In general, the methods of the 
present invention involve transfecting a eukaryotic host cell with an 
expression vector comprising both a product gene encoding a desired 
polypeptide and a selectable gene (preferably an amplif iable gene) . 

10 Selectable genes and product genes may be obtained from genomic DNA, 

cDNA transcribed from cellular RNA , or by in vitro synthesis. For example, 
libraries are screened with probes (such as antibodies or oligonucleotides 
of about 20-80 bases) designed to identify the selectable gene or the 
product gene (or the protein (s) encoded thereby) . Screening the cDNA or 

15 genomic library with the selected probe may be conducted using standard 
procedures as described in chapters 10-12 of Sambrook et al. ( Molecular 
Cloning: A Laboratory Manual (New York: Cold Spring Harbor Laboratory 
Press, 1989) . An alternative means to isolate the selectable gene or 
product gene is to use PCR methodology as described in section 14 of 

20 Sambrook et al. , supra. 

A preferred method of practicing this invention is to use carefully 
selected oligonucleotide sequences to screen cDNA libraries from various 
tissues known to contain the selectable gene or product gene. The 
oligonucleotide sequences selected as probes should be of sufficient length 

25 and suf ficient ly unambiguous that false positives are minimized. 

The oligonucleotide generally is labeled such that it can be detected 
upon hybridization to DNA in the library being screened. The preferred 
method of labeling is to use 32 P- labeled ATP with polynucleotide kinase, 
as is well known in the art, to radiolabel the oligonucleotide. However, 

3 0 other methods may be used to label the oligonucleotide, including, but not 
limited to, biot mylat ion or enzyme labeling. 

Sometimes, the DNA encoding the selectable gene and product gene is 
preceded by DNA encoding a signal sequence having a specific cleavage site 
at the N- terminus of the mature protein or polypeptide. In general, the 

3 5 signal sequence may be a component of the expression vector, or it may be 
a part of the selectable gene or product gene that is inserted into the 
expression vector. If a heterologous signal sequence is used, it 
preferably is one that is recognized and processed (i.e., cleaved by a 
signal peptidase) by the host cell. For yeast secretion the native signal 

40 sequence may be substituted by, e.g., the yeast invertase leader, alpha 
factor leader (including Saccharomyces and Kl uyveromyces a - factor leaders, 
the latter described in U.S. Pat. No. 5,010,182 issued 23 April 1991), or 
acid phosphatase leader, the C. albicans glucoamylase leader (EP 362,179 
published 4 April 1990) , or the signal described in WO 90/13646 published 

45 15 November 1990. In mammalian cell expression the native signal sequence 
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of the protein of interest is satisfactory, although other mammalian signal 
sequences may be suitable, such as signal sequences from secreted 
polypeptides of the same or related species, as well as viral secretory 
leaders, for example, the herpes simplex gD signal. The DNA for such 
precursor region is ligated in reading frame to the selectable gene or 
product gene . 

As shown in Figure 1A, the selectable gene is generally provided at 
the 5' end of the DNA construct and this selectable gene is followed by the 
product gene. Therefore, the full length (non-spiced) message will contain 
DHFR as the first open reading frame and will therefore generate DHFR 
protein to allow selection of stable transf ectants . The full length message 
is not expected to generate appreciable amounts of the protein of interest 
as the second AUG in a dicistronic message is an inefficient initiator of 
translation in mammalian cells (Kozak, J. Cell Biol . , 115: 887-903 [1991]). 

The selectable gene is positioned within an intron. Introns are 
noncoding nucleotide sequences, normally present within many eukaryotic 
genes, which are removed from newly transcribed mRNA precursors in a 
mult iple - step process collectively referred to as splicing. 

A single mechanism is thought to be responsible for the splicing of 
mRNA precursors in mammalian, plant, and yeast cells. In general, the 
process of splicing requires that the 5' and 3' ends of the intron be 
correctly cleaved and the resulting ends of the mRNA be accurately joined, 
such that a mature mRNA having the proper reading frame for protein 
synthesis is produced. Analysis of a variety of naturally occurring and 
synthetically constructed mutant genes has shown that nucleotide changes 
at many of the positions within the consensus sequences at the 5' and 3' 
splice sites have the effect of reducing or abolishing the synthesis of 
mature mRNA. Sharp, Science , 235:766 (1987); Padgett, et al . , Ann. Rev. 
Biochem. , 55:1119 (1986); Green, Ann. Rev. Genet . , 20:671 (1986). 
Mutational studies also have shown that RNA secondary structures involving 
splicing sites can affect the efficiency of splicing. Solnick, Cell , 
43:667 (1985); Konarska, et al . , Cell, 42:165 (1985). 

The length of the intron may also affect the efficiency of splicing. 
By making deletion mutations of different sizes within the large intron of 
the rabbit beta-globin gene, Wieringa, et al . determined that the minimum 
intron length necessary for correct splicing is about 69 nucleotides. 
Cell , 37:915 (1984) . Similar studies of the intron of the adenovirus E1A 
region have shown that an intron length of about 78 nucleotides allows 
correct splicing to occur, but at reduced efficiency. Increasing the 
length of the intron to 91 nucleotides restores normal splicing efficiency, 
whereas truncating the intron to 63 nucleotides abolishes correct splicing. 
Ulfendahl, et al., Nuc . Acids Res . , 13:6299 (1985). 

To be useful in the invention, the intron must have a length such 
that splicing of the intron from the mRNA is efficient. The preparation of 
introns of differing lengths is a routine matter, involving methods well 
known in the art, such as de novo synthesis or in vitro deletion 
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mutagenesis of an existing intron. Typically, the intron will have a length 
of at least about 150 nucleotides, since introns which are shorter than 
this tend to be spliced less efficiently. The upper limit for the length 
of the intron can be up to 30 kB or more. However, as a general 
5 proposition, the intron is generally less than about 10 kB in length. 

The intron is modified to contain the selectable gene not normally 
present within the intron using any of the various known methods for 
modifying a nucleic acid in vitro. Typically, a selectable gene will be 
introduced into an intron by first cleaving the intron with a restriction 

10 endonuclease , and then covalently joining the resulting restriction 
fragments to the selectable gene in the correct orientation for host cell 
expression, for example by ligation with a DNA ligase enzyme. 

The DNA construct is dicistronic, i.e. the selectable gene and 
product gene are both under the transcriptional control of a single 

15 transcriptional regulatory region. As mentioned above, the transcriptional 
regulatory region comprises a promoter. Suitable promoting sequences for 
use with yeast hosts include the promoters for 3 -phosphoglycerate kinase 
(Hitzeman et al . , J . Biol . Chem . , 255:2073 [1980]) or other glycolytic 
enzymes (Hess et al . , J. Adv. Enzyme Reg. , 7:149 [1968]/ and Holland, 

20 Biochemistry , 17:4900 [1978]), such as enolase, glyceraldehyde - 3 -phosphate 
dehydrogenase, hexokmase, pyruvate decarboxylase, phosphof ructokinase , 
glucose- 6 -phosphate isomerase, 3 -phosphoglycerate mutase , pyruvate kinase, 
triosephosphate isomerase , phosphoglucose isomerase , and glucokinase . 

Other yeast promoters, which are inducible promoters having the 

25 additional advantage of transcription controlled by growth conditions, are 
the promoter regions for alcohol dehydrogenase 2, isocytochrome C, acid 
phosphatase, degradative enzymes associated with nitrogen metabolism, 
metallothionein , glyceraldehyde- 3 -phosphate dehydrogenase, and enzymes 
responsible for maltose and galactose utilization. Suitable vectors and 

30 promoters for use in yeast expression are further described in Hitzeman et 
al . , EP 73,657A. Yeast enhancers also are advantageously used with yeast 
promoters . 

Expression control sequences are known for eukaryotes . Virtually all 
eukaryotic genes have an AT-rich region located approximately 25 to 30 

35 bases upstream from the site where transcription is initiated. Another 
sequence found 70 to 80 bases upstream from the start of transcription of 
many genes is a CXCAAT region where X may be any nucleotide. 

Product gene transcription from vectors in mammalian host cells is 
controlled by promoters obtained from the genomes of viruses such as 

40 polyoma virus, fowlpox virus (UK 2,211,504 published 5 July 1989), 
adenovirus (such as Adenovirus 2), bovine papilloma virus, avian sarcoma 
virus, cytomegalovirus, a retrovirus, hepatitis-B virus and most preferably 
Simian Virus 40 (SV4 0) , from heterologous mammalian promoters, e.g. the 
actin promoter or an immunoglobulin promoter, from heat-shock promoters, 

4 5 and 'from the promoter normally associated with the product gene, provided 
such promoters are compatible with the host cell systems. 
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The early and late promoters of the SV40 virus are conveniently 
obtained as an SV40 restriction fragment that also contains the SV40 viral 
origin of replication. Fiers et al . , Nature , 273:113 (1978); Mulligan and 
Berg, Science , 209:14 22-142 7 (1980); Pavlakis et al . , Proc . Natl. Acad. 
5 Sci . USA , 78:7398-7402 (1981) . The immediate early promoter of the human 
cytomegalovirus (CMV) is conveniently obtained as a Hindi I I E restriction 
fragment. Greenaway et al . , Gene , 18:355-360 (1982). A system for 
expressing DNA in mammalian hosts using the bovine papilloma virus as a 
vector is disclosed in U.S. 4,419,446. A modification of this system is 

10 described in U.S. 4,601,978. See also Gray et al . , Nature, 295:503-508 
(1982) on expressing cDNA encoding immune interferon in monkey cells; , 
Reyes et al . , Nature , 297:598-601 (1982) on expression of human 0- 
interferon cDNA in mouse cells under the control of a thymidine kinase 
promoter from herpes simplex virus, Canaani and Berg, Proc . Natl . Acad . 

IS Sci. USA , 79:5166-5170 (1982) on expression of the human interferon 01 gene 
in cultured mouse and rabbit cells, and Gorman et al . , Proc . Natl . Acad . 
Sci. USA , 79:6777-6781 (1982) on expression of bacterial CAT sequences in 
CV-1 monkey kidney cells, chicken embryo fibroblasts, Chinese hamster ovary 
cells, HeLa cells, and mouse NIH-3T3 cells using the Rous sarcoma virus 

20 long terminal repeat as a promoter. 

Preferably the transcriptional regulatory region in higher eukaryotes 
comprises an enhancer sequence. Enhancers are relatively orientation and 
position independent having been found 5' (Lainins et al., Proc . Natl . 
Acad. Sci. USA , 78:993 [1981]) and 3' (Lusky et al . , Mol . Cell Bio. , 3:1108 

25 [1983]) to the transcription unit , within an intron (Banerji et al . , Cell , 
33:729 [1983]) as well as within the coding sequence itself (Osborne et 
al . , Mol . Cell Bio . , 4:1293 [1984]) . Many enhancer sequences are now known 
from mammalian genes (globin, elastase, albumin, a- fetoprotein and 
insulin) . Typically, however, one will use an enhancer from a eukaryotic 

30 cell virus. Examples include the SV40 enhancer on the late side of the 
replication origin (bp 100-270) , the cytomegalovirus early promoter 
enhancer (CMV) , the polyoma enhancer on the late side of the replication 
origin, and adenovirus enhancers . See also Yaniv, Nature , 297:17-18 (1982) 
on enhancing elements for activation of eukaryotic promoters . The enhancer 

3 5 may be spliced into the vector at a position 5' or 3 ' to the product gene, 
but is preferably located at a site 5' from the promoter. 

The DNA construct has a transcriptional initiation site following the 
transcriptional regulatory region and a transcriptional termination region 
following the product gene (see Figure 1A) . These sequences are provided 

40 in the DNA construct using techniques which are well known in the art. 

The DNA construct normally forms part of an expression vector which 
may have other components such as an origin of replication (i.e., a nucleic 
acid sequence that enables the vector to replicate in one or more selected 
host cells) and, if desired, one or more additional selectable gene(s) . 

45 Construction of suitable vectors containing the desired coding and control 
sequences employs standard ligation techniques. Isolated plasmids or DNA 
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fragments are cleaved, tailored, and religated in the form desired to 
generate the plasmids required. 

Generally, in cloning vectors the origin of replication is one that 
enables the vector to replicate independently of the host chromosomal DNA, 
5 and includes origins of replication or autonomously replicating sequences . 
Such sequences are well known. The 2/i plasmid origin of replication is 
suitable for yeast, and various viral origins (SV40, polyoma, adenovirus, 
VSV or BPV) are useful for cloning vectors in mammalian cells. Generally, 
the origin of replication component is not needed for mammalian expression 
10 vectors (the SV40 origin may typically be used only because it contains the 
early promoter) . 

Most expression vectors are "shuttle" vectors, i.e., they are capable 
of replication in at least one class of organisms but can be transfected 
into another organism for expression. For example, a vector is cloned in 
15 E. coli and then the same vector is transfected into yeast or mammalian 
cells for expression even though it is not capable of replicating 
independently of the host cell chromosome. 

For analysis to confirm correct sequences in plasmids constructed, 
plasmids from the transf ormants are prepared, analyzed by restriction, 
20 and/or sequenced by the method of Messing et al . , Nucleic Acids Res. , 9:309 
(1981) or by the method of Maxam et al . , Methods in Enzymology , 65:499 
(1980) . 

The expression vector having the DNA construct prepared as discussed 
above is transformed into a eukaryotic host cell. Suitable host cells for 
25 cloning or expressing the vectors herein are yeast or higher eukaryote 
cells . 

Eukaryotic microbes such as filamentous fungi or yeast are suitable 
hosts for vectors containing the product gene. Saccharomyces cerevisiae, 
or common baker's yeast, is the most commonly used among lower eukaryotic 

3 0 host microorganisms. However, a number of other genera, species, and 
strains are commonly available and useful herein, such as S. pombe [Beach 
and Nurse, Nature , 290:140 (1981)], Kl uyveromyces lactis [Louvencourt et 
al., J . Bacteriol . , 737 (1983)], yarrowia [EP 402,226], Pichia pastoris [EP 
183,070], Trichoderma reesia [EP 244,234], Neurospora crassa [Case et al . , 

35 Proc. Natl. Acad. Sci . USA , 76:5259-5263 (1979)], and Aspergillus hosts 
such as A. nidulans [Ballance et al . , Biochem. Biophvs . Res. Commun . 
112 :2 84-289 (1983); Tilburn et a J . , Gene , 26 : 2 05-221 (1983 ) ; Yelton et al . , 
Proc. Natl. Acad. Sci. USA , 81:1470-1474 (1984)] and A . niger [Kelly and 
Hynes, EMBO J . , 4:475-479 (1985)]. 

40 Suitable host cells for the expression of the product gene are 

derived from multicellular organisms. Such host cells are capable of 
complex processing and glycosylat ion activities . In principle, any higher 
eukaryotic cell culture is workable, whether from vertebrate or 
invertebrate culture. Examples of invertebrate cells include plant and 

45 insect cells. Numerous baculoviral strains and variants and corresponding 
permissive insect host cells from hosts such as Spodoptera frugiperda 
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(caterpillar) , Aedes aegypti (mosquito) , Aades albopictus (mosquito) , 
Drosphila melanogaster (fruit fly) , and Bombyx mori host cells have been 
identified. See, e.g., Luckow et al . , Bio /Technology , 6:47-55 (1988); 
Miller et al . , in Genetic Engineering , Setlow, J.K. et al . , eds , Vol. 8 

(Plenum Publishing, 1986), pp. 277-279; and Maeda et al . , Nature , 315:592- 
594 (1985) . A variety of such viral strains are publicly available, e.g. , 
the L-l variant of Autographa cali fornica NPV and the Bm-5 strain of Bombyx 
mori NPV, and such viruses may be used as the virus herein according to the 
present invention, particularly for transfection of Spodoptera frugxperda 
cells . 

Plant cell cultures of cotton, corn, potato, soybean, petunia, 
tomato, and tobacco can be utilized as hosts. Typically, plant cells are 
transfected by incubation with certain strains of the bacterium 
AgroJbacterium cumefaci ens , which has been previously manipulated to contain 
the product gene. During incubation of the plant cell culture with A. 
tumefaciens , the product gene is transferred to the plant cell host such 
that it is transfected, and will, under appropriate conditions, express the 
product gene. In addition, regulatory and signal sequences compatible with 
plant cells are available, such as the nopaline synthase promoter and 
polyadenylation signal sequences. Depicker et al . , J . Mol . Appl . Gen . , 
1:561 (1982) . In addition, DNA segments isolated from the upstream region 
of the T-DNA 780 gene are capable of activating or increasing transcription 
levels of plant - expressible genes in recombinant DNA- containing plant 
tissue. EP 321,196 published 21 June 1989. 

However, interest has been greatest in vertebrate cells, and 
propagation of vertebrate cells in culture (tissue culture) has become a 
routine procedure in recent years [ Tissue Culture , Academic Press, Kruse 
and Patterson, editors (1973)] . Examples of useful mammalian host cell 
lines are monkey kidney CV1 line transformed by SV40 (COS- 7, ATCC CRL 
1651); human embryonic kidney line (293 or 293 cells subcloned for growth 
in suspension culture, Graham et al . , J . Gen Virol . , 36:59 [1977]); baby 
hamster kidney cells (BHK, ATCC CCL 10) ; Chinese hamster ovary cells/-DHFR 
(CHO, Urlaub and Chasin, Proc . Natl. Acad. Sci . USA , 77:4216 [1980]); 
dpl2.CHO cells (EP 307,247 published 15 March 1989); mouse Sertoli cells 
(TM4, Mather, Biol . Reprod . , 23:243-251 [1980]); monkey kidney cells (CVl 
ATCC CCL 70) ; African green monkey kidney cells (VERO-76, ATCC CRL- 1587) ; 
human cervical carcinoma cells (HELA, ATCC CCL 2); canine kidney cells 
(MDCK, ATCC CCL 34) ; buffalo rat liver cells (BRL 3A, ATCC CRL 1442) ; human 
lung cells (W138, ATCC CCL 75); human liver cells (Hep G2 , HB 8065); mouse 
mammary tumor (MMT 060562, ATCC CCL 5 1 ) ; TRI cells (Mather et al . , Annals 
N. Y . Acad . Sci . , 383:44-68 [1982]); MRC 5 cells; FS4 cells; and a human 
hepatoma line (Hep G2) . 

Host cells are transformed with the above - described expression or 
cloning vectors of this invention and cultured in conventional nutrient 
media modified as appropriate for inducing promoters, selecting 
transf ormants , or amplifying the genes encoding the desired sequences. 
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Infection with Agrobacterium tumefaciens is used for transformation 
of certain plant cells, as described by Shaw et al . , Gene , 23:315 (1983) 
and WO 89/05859 published 29 June 1989. For mammalian cells without such 
cell walls, the calcium phosphate precipitation method of Graham and van 
5 der Eb, Virology , 52:456-457 (1978) may be used. General aspects of 
mammalian cell host system transformations have been described by Axel in 
U.S. 4,399,216 issued 16 August 1983. Transformations into yeast are 
typically carried out according to the method of Van Solingen et al . , J . 
Bact . , 130:946 (1977) and Hsiao et al . , Proc . Natl. Acad. Sci . (USA) , 

10 76:3829 (1979) . However, other methods for introducing DNA into cells such 
as by nuclear injection or by protoplast fusion may also be used. 

In the preferred embodiment the DNA is introduced into the host cells 
using electroporation . See Andreason, J. Tiss. Cult. Meth. , 15:56-62 
(1993), for a review of electroporation techniques useful for practicing 

15 the instantly claimed invention. It was discovered that electroporation 
techniques for introducing the DNA construct into the host cells were 
preferable over calcium phosphate precipitation techniques insofar as the 
latter could cause the DNA to break up and forming concantemers . 

The mammalian host cells used to express the product gene herein 

20 may be cultured in a variety of media as discussed in the definitions 
section above. The media contains the selection agent used for selecting 
transformed host cells which have taken up the DNA construct (either as an 
intra- or extra-chromosomal element) . To achieve selection of the 
transformed eukaryotic cells, the host cells may be grown in cell culture 

25 plates and individual colonies expressing the selectable gene (and thus the 
product gene) can be isolated and grown in growth medium until the 
nutrients are depleted. The host cells are then analyzed for transcription 
and/or transformation as discussed below. The culture conditions, such as 
temperature, pH, and the like, are those previously used with the host cell 

30 selected for expression, and will be apparent to the ordinarily skilled 
artisan . 

Gene amplification and/or expression may be measured in a sample 
directly, for example, by conventional Southern blotting, Northern blotting 
to quantitate the transcription of mRNA (Thomas, Proc. Natl. Acad. Sci. 

35 USA , 77:5201-5205 [1980]), dot blotting (DNA analysis), or in situ 
hybridization, using an appropriately labeled probe, based on the sequences 
provided herein. Various labels may be employed, most commonly 

radioisotopes, particularly 32 P . However, other techniques may also be 
employed, such as using biot in -modi f ied nucleotides for introduction into 

40 a polynucleotide. The biotin then serves as the site for binding to avidin 
or antibodies, which may be labeled with a wide variety of labels, such as 
radionuclides, fluorescens, enzymes, or the like. Alternatively, 
antibodies may be employed that can recognize specific duplexes, including 
DNA duplexes, RNA duplexes, and DNA-RNA hybrid duplexes or DNA-protein 

4 5 duplexes. The antibodies in turn may be labeled and the assay may be 
carried out where the duplex is bound to a surface, so that upon the 
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formation of duplex on the surface, the presence of antibody bound to the 
duplex can be detected. 

Gene expression, alternatively, may be measured by immunological 
methods, such as immunohistochemical staining of tissue sections and assay 
5 of cell culture or body fluids, to quantitate directly the expression of 
gene product. With immunohistochemical staining techniques, a cell sample 
is prepared, typically by dehydration and fixation, followed by reaction 
with labeled antibodies specific for the gene product coupled, where the 
labels are usually visually detectable, such as enzymatic labels, 

10 fluorescent labels, luminescent labels, and the like. A particularly 
sensitive staining technique suitable for use in the present invention is 
described by Hsu et al . , Am. J. Clin . Path. , 75:734-738 (1980). 

In the preferred embodiment, the mRNA is analyzed by quantitative PCR 
(to determine the efficiency of splicing) and protein expression is 

15 measured using ELISA as described in Example 1 herein. 

The product of interest preferably is recovered from the culture 
medium as a secreted polypeptide, although it also may be recovered from 
host cell lysates when directly expressed without a secretory signal. When 
the product gene is expressed in a recombinant cell other than one of human 

20 origin, the product of interest is completely free of proteins or 
polypeptides of human origin. However, it is necessary to purify the 
product of interest from recombinant cell proteins or polypeptides to 
obtain preparations that are substantially homogeneous as to the product 
of interest. As a first step, the culture medium or lysate is centrifuged 

25 to remove particulate cell debris. The product of interest thereafter is 
purified from contaminant soluble proteins and polypeptides, for example, 
by fractionation on immunoaf f ini ty or ion-exchange columns; ethanol 
precipitation; reverse phase HPLC; chromatography on silica or on a cation 
exchange resin such as DEAE ; chromatof ocusing ; SDS-PAGE; ammonium sulfate 

30 precipitation; gel electrophoresis using, for example, Sephadex G-75; 
chromatography on plasminogen columns to bind the product of interest and 
protein A Sepharose columns to remove contaminants such as IgG. 

The following examples are offered by way of illustration only and 
are not intended to limit the invention in any manner. All patent and 

3 5 literature references cited herein are expressly incorporated by reference. 

EXAMPLE 1 

tPA production using the dicistronic expression vectors 
It was sought to increase the level of homogeneity with regard to 
expression levels of stable clones by expressing a selectable marker (such 
40 as DHFR) and the protein of interest from a single promoter. These vectors 
divert most of the transcript to product expression while linking it at a 
fixed ratio to DHFR expression via differential splicing. 

Vectors were constructed which were derived from the vector pRK (Suva 
ec al . , Science , 237:893-896 [1987]) which contains an intron between the 
45 cytomegalovirus immediate early promoter (CMV) and the cDNA that encodes 
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the polypeptide of interest. The intron of pRK is 139 nucleotides in 
length, has a splice donor site derived from cytomegalovirus immediate 
early gene (CMVIE) , and a splice acceptor site from an IgG heavy chain 
variable region (V H ) gene (Eaton et al . , Biochem. , 25:8343 [1986]) . 
5 DHFR/ intron vectors were constructed by inserting an EcoRV linker 

into the BSTXl site present in the intron of pRK7 . An 830 base-pair 
fragment containing a mouse DHFR coding fragment was inserted to obtain 
DHFR intron expression vectors which differ only in the sequence that 
comprises the splice donor site. Those sequences were altered by 

10 overlapping PCR mutagenesis to obtain sequences that match splice donor 
sites found between exons 3 and 4 of normal and mutant Ras genes . PCR was 
also used to destroy the splice donor site. 

A mouse DHFR cDNA fragment (Simonsen et al . , Proc . Natl. Acad. Sci . 
USA , 80:2495-2499 [1983]) was inserted into the intron of this vector 59 

15 nucleotides downstream of the splice donor site. The splice donor site of 
this vector was altered by mutagenesis to change the ratio of spliced to 
non-spliced message in transfected cells. It has previously been shown 
that a single nucleotide change {G to A) converted a relatively efficient 
splice donor site found in the normal ras gene into an inefficient splice 

20 site {Cohen et al . , Nature . 334:119-124 [1988]). This effect has been 
demonstrated in the context of the ras gene and confirmed when these 
sequences were transferred to human growth hormone constructs (Cohen et 
al . , Cell , 58:461-472 [1989]). Additionally, a non functional 5' splice 
site (GT to CA) was constructed as a control UGT) . A polylinker was 

25 inserted 35 nucleotides downstream of the 3' splice site to accept the cDNA 
of interest. A vector containing tPA (Pennica et al . , Nature , 301:214-221 
[1983]) was linearized downstream of the polyadenylat ion site before it was 
introduced into CHO cells (Potter et al . , Proc. Natl. Acad. Sci . USA , 
81: 7161 [1984] ) . 

30 Plasmid DNA' s that contained DHFR/ intron, tPA and (a) wild type ras 

(WT ras), i.e. Figure 3 (SEQ ID NO: 1), (b) mutant ras, or (c) non- 
functional splice donor site UGT) were introduced into CHO DHFR minus 
cells by electroporat ion . The intron vectors were each linearized 
downstream of the polyadenylation site by restriction endonuclease 

35 treatment. The control vector was linearized downstream of the second 
polyadenylation site. The DNA' s were ethanol precipitated after 
phenol /chloroform extraction and were resuspended in 20^1 1/10 Tris EDTA. 
Then, 10^g of DNA was incubated with 10^ CHO.dpl2 cells (EP 307,247 
published 15 March 1989) in 1 ml of PBS on ice for 10 min . before 

40 electroporation at 400 volts and 330^if using a BRL Cell Porator . 

Cells were returned to ice for 10 min. before being plated into non- 
selective medium. After 24 hours cells were fed nucleoside - free medium to 
select for stable DHFR+ clones which were pooled. The pooled DHFR+ clones 
were lysed and mRNA' s were prepared. 

45 To prepare the mRNA, RNA was extracted from 5 x 10^ cells which were 

grown from pools of more than 200 clones derived from the stable 
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trans feet ion of the three vectors, the essential construction of which is 
shown xn Figure IB and from non-transfected CHO cells. RNA was purified 
over oligo-DT cellulase (Collaborative Biomedical Products) . 10/ig of mRNA 
was then subjected to Northern blotting which involved running the mRNA on 
5 a 1.2% agarose, 6.6% formaldehyde gel, and transferring it to a nylon 
filter (Stratagene Duralon-UV membrane) , prehybndized, probed and washed 
according to the manufacturer's instructions. 

The filter was probed sequentially using probes (shown in Figure IB) 
that would detect (a) the full length message, (b) both full length and 

10 spliced message, or (c) beta actin. Probing with the long probe showed 
that the vector that contains the efficient splice donor site (i.e. WT ras) 
generates predominately a mRNA of the size predicted for the spliced 
product while the other two vectors gave rise primarily to a mRNA that 
corresponds in size to non-spliced message. The DHFR probe detected only 

15 full length message and demonstrated that the WT ras splice donor derived 
vector generates very little full length message with which to confer a 
DHFR positive phenotype . 

Figure 4 shows the number of DHFR positive colonies obtained after 
duplicate elec troporat ions with the three intron vectors described above 

20 and from a conventional vector that has a CMV promoter driving tPA and a 
SV4 0 promoter driving DHFR (see Figure 2) . The increase in colony number 
parallels the increase in full length message that accumulates with the 
modification of the splice donor sites. The conventional vector 

efficiently generates colonies and does not vary significantly from the &GT 

25 construct. 

The level of tPA expression was determined by seeding cells in 1 ml 
of F12 : DMEM (50:50, with 5% FBS ) in 24 well dishes to near confluency. 
Growth of the cells continued until the media was exhausted. Media was 
then assayed by ELISA for tPA production. Briefly, anti-tPA antibody was 
30 coated onto the wells of an ELISA microtiter plate, media samples were 
added to the wells followed by washing. Binding of the antigen (tPA) was 
then quantified using horse radish peroxidase (HRPO) labelled anti-tPA 
antibody . 

Figure 5A depicts the titers of secreted tPA protein after pooling 
35 the clones of each group shown in Figure 4 . While the number of colonies 
increased with a weakening of splice donor function, the inverse was seen 
with respect to tPA expression. The expression levels are consistent with 
the RNA products that are observed; as more of the dicistronic message is 
spliced an increased amount of message will contain tPA as the first open 
40 reading frame resulting in increased tPA expression. A mutation of GT to 
CA in the splice donor site results in an abundance of DHFR positive 
colonies which express undetectable levels of tPA, possibly resulting from 
inefficient utilization of the second AUG. Importantly, Figure 5A also 
shows that expression levels obtained from one of the dicistronic vectors 
45 (with WT ras SD) was about threefold higher than that obtained with the 
control vector containing a CMV promoter / enhancer driving tPA, SV40 
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promoter /enhancer controlling DHFR and SV40 polyadenylat ion signals 
controlling the expression of tPA and DHFR. 

Additionally, the homogeneity of expression in the pools was 
investigated. Figure 5B shows that all 20 clones generated by the WT ras 
5 splice donor site derived dicistronic vectors express detectable levels of 
tPA while only 4 of 20 clones generated by the control vector express tPA. 
None of the clones transf ected with the non-splicing UGT) vector expressed 
tPA levels detectable by ELISA. This finding is consistent with previous 
observations that relatively few clones generated by conventional vectors 

10 make useful levels of protein. 

Expression of tPA was increased following methotrexate amplification 
of pools. Figure 5C shows that 2 of the dicistronic vector derived pools 
{i.e. with WT ras and MUTANT ras SD sites) increased in expression markedly 
(8.4 and 7.7 fold), while the pool generated by the conventional vector 

15 increased only slightly (2.8 fold) when each was subjected to 200 nM Mtx . 
An overall increase of 9 fold was obtained using the best dicistronic (WT 
ras SD) versus the conventional vector following amplification. Growth of 
the highest expressing amplified pool in nutrient rich production medium 
yielded titers of 4.2 ^g/ml tPA. 

2C It was shown that manipulation of the splice donor sequence alters 

the ratio of spliced to full length message and the number of colonies that 
form in selective medium. It was also shown that dicistronic expression 
vectors generate clones that express high levels of recombinant proteins . 
Surprisingly, it was possible to isolate high expressors which had the 

25 efficient WT ras splice donor site by selection for DHFR* cells despite the 
efficiency with which the DHFR gene was spliced from the RNA precursors 
formed in these cells. 



was evaluated in the dicistronic vector system containing, as the DNA of 
interest, an immunoadhesin (TNFr-IgG) capable of binding tumor necrosis 
factor (TNF) (Ashkenazi et al . , Proc . Natl. Acad. Sci . USA , 88:1053 5-1053 9 
[1991] ) . The experiments described in Example 1 above were essentially 

35 repeated except that the product gene encoded the immunoadhesin TNFr-IgG. 
Plasmid DNA's that contained a TNFr-IgG cDNA and (a) WT ras, i.e. Figure 
6 { SEQ ID NO : 2) , (b) mutant ras or (c) nonfunctional splice donor site 
(AGT) were introduced into the dpl2.CH0 cells as discussed for Example 1. 
See Figure 1C for an illustration of the DNA constructs. 

40 It was discovered that the number of DHFR positive colonies generated 

by three of these vectors was similar to that seen with the tPA constructs. 
Expression of TNFr-IgG also paralleled that seen with the tPA constructs 
(Figure 7A) . Amplification of pools from two of the constructs showed a 
marked increase in expression of immunoadhesin (9.6 and 6.8 fold) (Figure 



EXAMPLE 2 



30 



TNFr-IgG production using the dicistronic expression vectors 
To prove the general applicability of this approach, a second product 
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7B) . The best of these amplified pools expressed 9.5 iig/n\l when grown in 
nutrient rich production medium. 

Thus, it was again shown that dicistronic expression vectors generate 
clones that express high levels of recombinant proteins. Furthermore, 
5 contrary to expectations, it was discovered that isolation of high product 
expr -sing host DHFR* cells was possible using an efficient splice donor 
site i.e. the WT ras splice donor site) . 

EXAMPLE 3 

Antibody production using a dicistronic expression vector 

10 The usefulness of this system for antibody expression was evaluated 

by testing production of an antibody directed against IgE (Presta et al. , 
Journal of Immunology , 151:2623-2632 [1993]). Further, the flexibility of 
the system with regard to transcription initiation was tested by replacing 
the CMV promoter/enhancer present in the previous vectors with the 

15 promoter/ enhancer derived from the early region of SV40 virus {Griffin, 
B . , Structure and Genomic Organization of SV4 0 and Polyoma Virus, In J. 
Tooze [Ed] DNA Tumor Viruses, Cold Spring Harbor Laboratory, Cold Spring 
Harbor, New York) . The heavy chain of the antibody was inserted downstream 
of DHFR as described in the earlier tPA and TNFr-IgG constructs. 

20 Additionally, a new splice donor site sequence ( GAC : GTAAGT ) was engineered 
into the vector which matches the consensus splice donor site more closely 
than did the splice donor sites present in the vectors tested in Examples 
1 and 2. The resultant expression vector is shown in Figures ID and 9. 

It was discovered that this vector produced fewer colonies than the 

25 vectors previously tested, and produced predominantly a spliced RNA 
product. A second vector was constructed to have the light chain of the 
antibody under control of the SV4 0 promoter/enhancer and poly- A and the 
hygromycin B resistance gene under control of the CMV promoter/enhancer and 
SV40 poly-A. These vectors were linearized at unique Hpal sites downstream 

30 of the poly-A signal, mixed at a ratio of light chain vector to heavy chain 
vector of 10:3 and elect roporated into CHO cells using an optimized 
protocol (as discussed in Examples 1 and 2) . 

Figure 11 shows the levels of antibody expressed by clones and pools 
after selection in hygromycin B followed by selection for DHFR expression. 

3 5 All 2 0 of the clones analyzed expressed high levels of antibody when grown 
in rich medium and varied from one another by only a factor of four. A 
pool of antibody producing clones was generated and assayed shortly after 
it was established. That pool was grown continuously for 6 weeks without 
a significant decrease in productivity demonstrating that its stability was 

40 sufficient to generate gram quantities of protein from its large scale 
culture . 

The pool was subjected to methotrexate amplification at 200nM and l^M 
and achieved a greater than 2 fold increase in antibody titer. The l^M Mtx 
resistant pool achieved a titer of 41 mg/L when grown under optimal 
45 conditions in suspension culture . 
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The structure of the expressed antibody was examined. Proteins 
expressed by the 200nM methotrexate resistant pool and by a well 
characterized expression clone generated by conventional vectors (Presta 
et al . [1993], supra) were metabol ically labeled with S 35 cysteine and 
5 methionine. In particular, confluent 35mm plates of cells were 

metabolically labeled with 50jiCi each S-35 methionine and S-35 cysteine 
(Amersham) in serum free cysteine and methionine free F12 : DMEM . After one 
hour, nutrient rich production media was added and labeled proteins were 
allowed to "chase" into the medium for six more hours. Proteins were run 

10 on a 12% SDS/PAGE gel (NOVEX) non-reduced or following reduction with B- 
mercaptoethanol . Dried gels were exposed to film for 16 hours. CHO 
control cells were also labeled. 

The majority of the antibody protein is secreted with a molecular 
weight of about 155 kilodaltons, consistent with a properly disulfide- 

15 linked antibody molecule with 2 light and 2 heavy chains. Upon reduction 
the molecular weight shifts to 2 approximately equally abundant proteins 
of 22.5 and 55 kilodaltons. The protein generated from the pool is 
indistinguishable from the antibody produced by the well characterized 
expression clone, with no apparent increase of free heavy or light chain 

20 expressed by the pool. 

CONCLUSION 

The efficient expression system described herein utilizes vectors 
consisting of promoter /enhancer elements followed by an intron containing 
the selectable marker coding sequence, followed by the cDNA of interest and 

25 a polyadenylat ion signal . 

Several splice donor site sequences were tested for their effect on 
colony number and expression of the cDNA of interest. A non- functional 
splice donor site, splice donor sites found in an intron between exons 3 
and 4 of mutant (mutant ras) and normal {WT ras) forms of the Harvey Ras 

30 gene and another efficient SD site {see Example 3) were used. The vectors 
were designed to direct expression of dicistronic primary transcripts . 
Within a transfected cell some of the transcripts remain full length while 
the remainder are spliced to excise the DHFR coding sequence. When the 
splice donor site is weakened or destroyed an increase in colony number 

35 is observed. 

Expression levels show the inverse pattern, with the most efficient 
splice donor sites generating the highest levels of tPA, TNFr immunoadhesin 
or ant l - IgE V H . 

The homogeneity of expression of clones generated by the ras splice 
4 0 donor site intron DHFR vectors was compared to clones generated from a 
conventional vector with a separate promoter /enhancer and polyadenylat ion 
signal for each DHFR and tPA. The DHFR intron vector gives rise to 
colonies that are much more homogeneous with regard to expression than 
those generated by the conventional vector. Non-expressing clones derived 
4 5 from the conventional vector may be the result of breaks in the tPA or 

-26- 



BNSDOCID <WO 9604391 A1_L> 



WO 96/04391 



PCT/US95/09576 



TNFr-IgG domain of the plasmid during integration into the genome or the 
result of methylation of promoter elements (Busslinger et ai . , Cell , 
34:197-206 [1983]; Watt et al . , Genes and Development , 2:1136-1143 [1988]) 
driving tPA or TNFr-IgG expression. Promoter silencing by methylation or 
5 breaks in the DHFR-intron vectors would very likely render them incapable 
of conferring a DHFR positive phenotype . 

It was found that pools generated by the DHFR-intron vectors could 
be amplified in methotrexate and would increase in expression by a factor 
of 8.4 (tPA) , or 9.8 (TNFr-IgG). Pools from conventional vectors increased 

10 by only 2.8 and 3.0 fold for tPA and TNFr-IgG when amplified similarly. 
Amplified pools resulted in 9 fold higher tPA levels and 15 fold higher 
TNFr-IgG levels when compared to the conventional vector amplified pools. 

Without being limited to any theory, the increase in expression of 
methotrexate resistant pools derived from the dicistronic vectors is likely 

15 due to the transcriptional linkage of DHFR and the product; when cells are 
selected for increased DHFR expression they consistently over-express 
product. Conventional approaches lack selectable marker and cDNA 

expression linkage and therefore methotrexate amplification often generates 
DHFR overexpression without the concomitant increase in product expression. 

20 A further increase of 4 and 6.3 fold in expression were obtained when 

amplified tPA and TNFr-IgG pools were transferred from the media used for 
the selections and amplifications to a nutrient rich production medium. 

In Example 3, the expression vector had a splice donor site that more 
closely matches the consensus splice donor sequence and had the heavy chain 

25 of a humanized anti-IgE antibody inserted downstream. This vector was 
linearized and co- electroporated with a second linearized vector that 
expresses the hygromycin resistance gene and the light chain of the 
antibody each under the control of its own promoter /enhancer and poly-A 
signals. An excess of light chain expression vector over the heavy chain 

30 dicistronic expression vector was used to bias in favor of light chain 
expression. Clones and a pool were generated after hygromycin B and DHFR 
selections. The clones were found to express relatively consistent, high 
levels of antibody, as did the pool. The 1/xM pool achieved a titer of 
4lmg/L when grown under optimal conditions in suspension culture. 

35 The anti-IgE antibody was assessed by metabolic labeling followed by 

SDS/PAGE under reducing and non reducing conditions and found to be 
indistinguishable from the protein expressed by a highly characterized 
clonal cell line. Of particular importance is the finding that no free 
light chain is observed in the pool relative to the clone. 

40 A stable expression system for CHO cells has been developed that 

produces high levels of recombinant proteins rapidly and with less effort 
than that required by other expression systems. The vector system 
generates stable clones that express consistently high levels thereby 
reducing the number of clones that must be screened to obtain a highly 

45 productive clonal line. Alternatively, pools have been used to 

conveniently generate moderate to high levels of protein. This approach 



BNSDOCID <WO 9604391 A1 I > 





PCT/US95/09576 



WO 96/04391 



may be particularly useful when a number of related proteins are to be 
expressed and compared. 

Without being limited to this theory, it is possible the vectors that 
have very efficient splice donor sites generate very productive clones 
5 because so little transcript remains non spliced that only integration 
events that lead to the generation of high levels of RNA produce enough 
DHFR protein to give rise to colonies in selective medium. The high level 
of spliced message from such clones is then translated into abundant 
amounts of the protein of interest. Pools of clones made concurrently by 
10 introducing conventional vectors expressed lower levels of protein, and 
were unstable with regard to long term expression, and expression could not 
be appreciably increased when the cells were subjected to methotrexate 
amplification . 



15 levels of single and multiple subunit polypeptides to be rapidly generated 
from clones or pools of stable t ransf ectants . This expression system 
combines the advantages of transient expression systems (rapid and labor 
non intensive generation of research amounts of protein) with the 
concurrent development of highly productive stable production cell lines. 



The system developed herein is versatile in that it allows high 
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(B) FILING DATE: 

(C) CLASSIFICATION : 

30 (vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/286740 

(B) FILING DATE: 05-AUG-1994 

(viii) ATTORNEY / AGENT INFORMATION: 
3 5 (A) NAME: Lee, Wendy M. 

(B) REGISTRATION NUMBER: 00,000 

(C) REFERENCE /DOCKET NUMBER: 7 9 8PCT 

{ ix) TELECOMMUNICATION INFORMATION : 
40 (A) TELEPHONE: 415/225-1994 

(B) TELEFAX: 415/952-98 81 

(C) TELEX: 910/371-7168 



45 



60 



65 



(2) INFORMATION FOR SEQ ID NO : 1 : 



(i) SEQUENCE CHARACTERISTICS : 
(A) LENGTH: 7360 bases 
{ B ) TYPE: nucleic acid 
(C) STRANDEDNESS : double 

5 0 (D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 
5 5 TTCGAGCTCG CCCGACATTG ATTATTGACT AGTTATTAAT AGTAATCAAT 5 0 
TACGGGGTCA TTAGTTCATA G C C CAT AT AT GGAGTTCCGC GTTACATAAC 100 
TTACGGTAAA TGGCCCGCCT GGCTGACCGC CCAACGACCC CCGCCCATTG 150 
ACGTCAATAA TGACGTATGT T C C CAT AG T A ACGCCAATAG GGACTTTCCA 2 00 
TTGACGTCAA TGGGTGGAGT ATTTACGGTA AACTGCCCAC TTGG CAGTAC 2 50 
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ATCAAGTGTA TCATATGCCA AGTACGCCCC CTATTGACGT CAATGACGGT 300 



AAATGGCCCG CCTGGCATTA TGCCCAGTAC ATGACCTTAT GGGACTTTCC 3 50 

5 

TACTTGGCAG TACATCTACG TATTAGTCAT CGCTATTACC ATGGTGATGC 400 
1 0 GGTTTTGGCA GTACATCAAT GGGCGTGGAT AGCGGTTTGA CTCACGGGGA 4 5 0 
TTTCCAAGTC TCCACCCCAT TGACGTCAAT GGGAGTTTGT TTTGGCACCA 50 0 

15 

AAATCAACGG GACTTTCCAA AATGTCGTAA CAACTCCGCC CCATTGACGC 550 
AAATGGGCGG TAGGCGTGTA CGGTGGGAGG TCTATATAAG CAGAGCTCGT 6 00 

20 

TTAGTGAACC GTCAGATCGC CTGGAGACGC CATCCACGCT GTTTTGACCT 6 50 
2 5 CCATAGAAGA CACCGGGACC GATCCAGCCT CCGCGGCCGG GAACGGTGCA 7 00 
TTGGAACGCG GATTCCCCGT GCCAAGAGTG CTGTAAGTAC CGCCTATAGA 750 

30 

GCGATAAGAG GATTTTATCC CCGCTGCCAT CATGGTTCGA CCATTGAACT 8 00 
GCATCGTCGC CGTGTCCCAA AATATGGGGA TTGGCAAGAA CGGAGACCTA 8 50 

35 

CCCTGCCCTC CGCTCAGGAA CGCGTTCAAG TACTTCCAAA GAATGACCAC 900 

4 0 AACCTCTTCA GTGGAAGGTA AACAGAATCT GGTGATTATG GGTAGGAAAA 9 50 

CCTGGTTCTC CATTCCTGAG AAGAATCGAC CTTTAAAGGA CAGAATTAAT 1000 

45 

ATAGTTCTCA GTAGAGAACT CAAAGAACCA CCACGAGGAG CTCATTTTCT 105 0 
TGCCAAAAGT TTGGATGATG CCTTAAGACT TATTGAACAA CCGGAATTGG 1100 

50 

CAAGTAAAGT AGACATGGTT TGGATAGTCG GAGGCAGTTC TGTTTACCAG 115 0 

5 5 GAAGCCATGA ATCAACCAGG CCACCTTAGA CTCTTTGTGA CAAGGATCAT 12 00 

GC AGGAATTT GAAAGTGACA CGTTTTTCCC AGAAATTGAT TTGGGGAAAT 12 50 

60 

ATAAACCTCT CCCAGAATAC CCAGGCGTCC TCTCTGAGGT CCAGGAGGAA 13 00 
AAAGGCATCA AGTATAAGTT TGAAGTCTAC GAGAAGAAAG ACTAACAGGA 13 50 

65 

AGATGCTTTC AAGTTCTCTG CTCCCCTCCT AAAGCTATGC ATTTTTATAA 14 00 
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GACCATGGGA CTTTTG CTGG CTTTAGACCC CCTTGGCTTC GTTAGAACGC 14 5 0 
GGCTACAATT AATACATAAC CTTATGTATC ATACACATAG ATTTAGGTGA 15 00 
CACTATAGAA TAACATCCAC TTTGCCTTTC TCTCCACAGG TGTCACTCCA 1550 
GGTCAACTGC ACCTCGGTTC TAAGCTTGGG CTGCAGGTCG CCGTGAATTT 16 0 0 
AAGGGACGCT GTGAAGCAAT CATGGATGCA ATGAAGAGAG GGCTCTGCTG 16 5 0 
TGTGCTGCTG CTGTGTGGAG CAGTCTTCGT TTCGCCCAGC CAGGAAATCC 1700 
ATGCCCGATT CAGAAGAGGA GCCAGATCTT ACCAAGTGAT CTGCAGAGAT 1750 
GAAAAAACGC AGATGATATA CCAGCAACAT CAGTCATGGC TGCGCCCTGT 18 0 0 
GCTCAGAAGC AACCGGGTGG AATATTGCTG GTGCAACAGT GGCAGGGCAC 18 5 0 
AGTGCCACTC AGTGCCTGTC AAAAGTTGCA GCGAGCCAAG GTGTTTCAAC 190 0 
GGGGGCACCT GCCAGCAGGC CCTGTACTTC TCAGATTTCG TGTGCCAGTG 195 0 
CCCCGAAGGA TTTGCTGGGA AGTGCTGTGA AAT AG AT AC C AGGGCCACGT 2 0 00 
G CTACGAGG A C CAGGGC AT C AGCTACAGGG GCACGTGGAG CACAGCGGAG 20 5 0 
AGTGGCGCCG AGTGCACCAA CTGGAACAGC AGCGCGTTGG CCCAGAAGCC 2100 
CTACAGCGGG CGGAGGCCAG ACGCCATCAG GCTGGGCCTG GGGAACCACA 2150 
ACTACTGCAG AAACCCAGAT CGAGACTCAA AGCCCTGGTG CTACGTCTTT 2 2 00 
AAGG CGGGG A AGTACAGCTC AGAGTTCTGC AGCACCCCTG CCTGCTCTGA 22 50 
GGGAAACAGT GACTGCTACT TTGGGAATGG GTCAGCCTAC CGTGGCACGC 23 00 
ACAGCCTCAC CGAGTCGGGT GCCTCCTGCC TCCCGTGGAA TTCCATGATC 2 3 50 
CTGATAGGCA AGGTTTACAC AG CAC AG AAC CCCAGTGCCC AGGCACTGGG 24 0 0 
CCTGGGCAAA CATAATTACT GCCGGAATCC TGATGGGGAT GCCAAGCCCT 24 50 
GGTGCCACGT GCTGAAGAAC CGCAGGCTGA CGTGGGAGTA CTGTGATGTG 2 500 
CCCTCCTGCT CCACCTGCGG CCTGAGACAG TACAGCCAGC CTCAGTTTCG 2 5 50 
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CATCAAAGGA GGGCTCTTCG CCGACATCGC CTCCCACCCC TGGCAGGCTG 26 0 0 
CCATCTTTGC CAAGCACAGG AGGTCGCCCG GAGAGCGGTT CCTGTGCGGG 26 5 0 

5 

GGCATACTCA TCAGCTCCTG CTGGATTCTC TCTGCCGCCC ACTGCTTCCA 2 7 00 
10 GGAGAGGTTT CCGCCCCACC ACCTGACGGT GATCTTGGGC AGAACATACC 27 5 0 
GGGTGGTCCC TGGCGAGGAG GAG C AG AAAT TTGAAGTCGA AAAATACATT 2 8 00 
GTCCATAAGG AATTCGATGA TGACACTTAC GACAATGACA TTGCGCTGCT 28 5 0 
GCAGCTGAAA TCGGATTCGT CCCGCTGTGC C C AGG AG AG C AGCGTGGTCC 2 900 
GCACTGTGTG CCTTCCCCCG GCGGACCTGC AGCTGCCGGA CTGGACGGAG 2 950 
2 5 TGTGAGCTCT CCGGCTACGG CAAGCATGAG GCCTTGTCTC CTTTCTATTC 3 0 00 



15 



20 



30 



35 



GGAGCGGCTG AAGGAGGCTC ATGTCAGACT GTACCCATCC AGCCGCTGCA 3 0 50 
CATCACAACA TTTACTTAAC AGAACAGTCA CCGACAACAT GCTGTGTGCT 3100 
GGAGACACTC GGAGCGGCGG GCCCCAGC7A AACTTGCACG ACGCCTGCCA 3150 
GGG CGATTCG GGAGGCCCCC TGGTGTGTCT GAACGATGGC CGCATGACTT 3 2 00 

4 0 TGGTGGGCAT CAT C AG CTG G GGCCTGGGCT GTGGACAGAA GGATGTCCCG 3 2 50 

GGTGTGTACA C C AAGGTT AC CAACTACCTA GACTGGATTC GTGACAACAT 3 3 00 
GCGACCGTGA CCAGGAACAC CCGACTCCTC AAAAGCAAAT GAGATCCCGC 33 50 
CTCTTCTTCT TCAGAAGACA CTG CAAAGG C GCAGTGCTTC TCTACAGACT 34 00 
TCTCCAGACC CACCACACCG CAGAAGCGGG ACGAGACCCT ACAGGAGAGG 34 50 

5 5 GAAGAGTGCA TTTTCCCAGA TACTTCCCAT TTTGGAAGTT TTCAGGACTT 3 50 0 

GGTCTGATTT C AGG AT ACT C TGTCAGATGG GAAGACATGA ATGCACACTA 3 5 50 
GCCTCTCCAG GAATGCCTCC TCCCTGGGCA GAAGTGGGGG GAATTCAATC 36 0 0 
GATGGCCGCC ATGGCCCAAC TTGTTTATTG CAGCTTATAA TGGTTACAAA 36 50 
TAAAGCAATA GCATCACAAA TTTCACAAAT AAAGCATTTT TTTCACTGCA 3 70 0 
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TTCTAGTTGT GGTTTGTCCA AACTCATCAA TGTATCTTAT CATGTCTGGA 3 750 
TCGATCGGGA ATTAATTCGG CGCAGCACCA TGGCCTGAAA TAACCTCTGA 3 8 00 
AAGAGGAACT TGGTTAGGTA CCTTCTGAGG CGGAAAGAAC CAGCTGTGGA 3 850 
ATGTGTGTCA GTTAGGGTGT GGAAAGTCCC CAGGCTCCCC AGCAGGCAGA 3 90 0 
AGTATGCAAA GCATGCATCT CAATTAGTCA GCAACCAGGT GTGGAAAGTC 3 95 0 
CCCAGGCTCC CCAGCAGGCA GAAGTATGCA AAGCATGCAT C T CAAT TAG T 4 00 0 
CAGCAACCAT AGTCCCGCCC CTAACTCCGC CCATCCCGCC CCTAACTCCG 4 05 0 
CCCAGTTCCG CCCATTCTCC GCCCCATGGC TGACTAATTT TTTTTATTTA 4100 
TGCAGAGGCC GAGGCCGCCT CGGCCTCTGA GCTATTCCAG AAGTAGTGAG 4150 
GAGGCTTTTT TGGAGGCCTA GGCTTTTGCA AAAAGCTGTT AACAGCTTGG 42 0 0 
CACTGGCCGT CGTTTTACAA CGTCGTGACT GGGAAAACCC TGGCGTTACC 4 2 50 
CAACTTAATC GCCTTGCAGC ACATCCCCCC TTCGCCAGCT GGCGTAATAG 4 3 00 
CGAAGAGGCC CGCACCGATC GCCCTTCCCA ACAGTTGCGT AGCCTGAATG 43 50 
GCGAATGGCG CCTGATGCGG TATTTTCTCC TTACGCATCT GTGCGGTATT 44 0 0 
TCACACCGCA TACGTCAAAG CAACCATAGT ACGCGCCCTG TAGCGGCGCA 44 50 
TT AAGCGCGG CGGGTGTGGT GGTTACGCGC AGCGTGACCG CTACACTTGC 4 500 
CAGCGCCCTA GCGCCCGCTC CTTTCGCTTT CTTCCCTTCC TTTCTCGCCA 4 550 
CGTTCGCCGG CTTTCCCCGT CAAGCTCTAA ATCGGGGGCT CC CTTTAGGG 46 0 0 
TTCCGATTTA GTGCTTTACG GCACCTCGAC CCCAAAAAAC TTGATTTGGG 46 5 0 
TGATGGTTCA CGTAGTGGGC CATCGCCCTG ATAGACGGTT TTTCGCCCTT 4 70 0 
TGACGTTGGA GTCCACGTTC TTTAATAGTG GACTCTTGTT CCAAACTGGA 4 75 0 
ACAACACTCA ACCCTATCTC GGGCTATTCT TTTGATTTAT AAGGGATTTT 4 800 
GCCGATTTCG GCCTATTGGT TAAAAAATGA GCTGATTTAA CAAAAATTTA 4 850 
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ACGCGAATTT TAACAAAATA TTAACGTTTA CAATTTTATG GTGCACTCTC 4 900 
AGTACAATCT GCTCTGATGC CGCATAGTTA AGCCAACTCC GCTATCGCTA 4 9 50 

5 

CGTGACTGGG TCATGGCTGC GCCCCGACAC CCGCCAACAC CCGCTGACGC 5 0 00 
10 GCC CTGACGG GCTTGTCTGC TCCCGGCATC CGCTTACAGA CAAGCTGTGA 5 0 50 
CCGTCTCCGG GAGCTGCATG TGTCAGAGGT TTTCACCGTC ATCACCGAAA 5100 

15 

CGCGCGAGGC AGTATTCTTG AAGACGAAAG GGCCTCGTGA TACGCCTATT 5150 
TTTATAGGTT AATGTCATGA TAATAATGGT TTCTTAGACG TCAGGTGGCA 52 0 0 

20 

CTTTTCGGGG AAATGTGCGC GGAACCCCTA TTTGTTTATT TTTCTAAATA 5 2 50 
2 5 CATTCAAATA TGTATCCGCT CATGAGACAA TAACCCTGAT AAATGCTTCA 5300 

ATAATATTGA AAAAGGAAGA GTATGAGTAT TCAACATTTC CGTGTCGCCC 535 0 

30 

TTATTCCCTT TTTTGCGGCA TTTTGCCTTC CTGTTTTTGC TCACCCAGAA 54 0 0 
ACGCTGGTGA AAGTAAAAGA TGCTGAAGAT CAGTTGGGTG CACGAGTGGG 54 5 0 

35 

TTACATCGAA CTGGATCTCA ACAGCGGTAA GATCCTTGAG AGTTTTCGCC 5 50 0 

4 0 CCGAAGAACG TTTTCCAATG ATGAGCACTT TTAAAGTTCT GCTATGTGGC 5550 

GCGGTATTAT CCCGTGATGA CGCCGGGCAA GAGCAACTCG GTCGCCGCAT 560 0 

45 

ACACTATTCT CAGAATGACT TGGTTGAGTA CTCACCAGTC ACAGAAAAGC 56 5 0 
ATCTTACGGA TGGCATGACA GTAAGAGAAT TATGCAGTGC TGCCATAACC 570 0 

50 

ATGAGTGATA ACACTGCGGC CAACTTACTT CTGACAACGA TCGGAGGACC 575 0 

5 5 GAAGGAGCTA ACCGCTTTTT TGCACAACAT GGGGGATCAT GTAACTCGCC 5 8 00 

TTGATCGTTG GGAACCGGAG CTGAATGAAG C CAT AC CAAA CGACGAGCGT 58 5 0 

60 

GACACCACGA TGCCAGCAGC AATGGCAACA ACGTTGCGCA AACTATTAAC 5 90 0 
TGGCGAACTA CTTACTCTAG CTTCCCGGCA ACAATTAATA GACTGGATGG 5 95 0 

65 

AGG CGGATAA AGTTGCAGGA CCACTTCTGC GCTCGGCCCT TCCGGCTGGC 6 00 0 



-34- 



BNSDOCID <WO 9604391 A1 _l_> 



WO 96/04391 





PCT/US95/09576 



TGGTTTATTG CTGATAAATC TGGAGCCGGT GAGCGTGGGT CTCGCGGTAT 6 0 5 0 
CATTGCAGCA CTGGGGCCAG ATGGTAAGCC CTCCCGTATC GTAGTTATCT 610 0 



ACACGACGGG G AGTCAGG C A ACTATGGATG AACGAAATAG ACAGATCGCT 6150 
10 GAGATAGGTG CCTCACTGAT TAAGCATTGG TAACTGTCAG ACCAAGTTTA 62 0 0 
CTCATATATA CTTTAGATTG ATTTAAAACT TCATTTTTAA TTTAAAAGGA 6 25 0 

15 

TCTAGGTGAA GATCCTTTTT GATAATCTCA TGACCAAAAT CCCTTAACGT 6 3 00 
GAGTTTTCGT TCCACTGAGC GTCAGACCCC GTAGAAAAGA TCAAAGGATC 6 3 50 

20 

TTCTTGAGAT CCTTTTTTTC TGCGCGTAAT CTGCTGCTTG CAAACAAAAA 64 0 0 
2 5 AACCACCGCT ACCAGCGGTG GTTTGTTTGC CGGATCAAGA GCTACCAACT 64 5 0 
CTTTTTCCGA AGGTAACTGG CTTCAGCAGA GCGCAGATAC CAAATACTGT 6 50 0 

30 

CCTTCTAGTG TAGCCGTAGT TAGGCCACCA CTTCAAGAAC TCTGTAGCAC 6 550 
CGCCTACATA CCTCGCTCTG CTAATCCTGT TACCAGTGGC TGCTGCCAGT 66 0 0 

35 

GGCGATAAGT CGTGTCTTAC CGGGTTGGAC TCAAGACGAT AGTT AC CGGA 6 6 50 

4 0 TAAGGCGCAG CGGTCGGGCT GAACGGGGGG TTCGTGCACA CAGCCCAGCT 6 7 00 

TGGAGCGAAC GACCTACACC GAACTGAGAT ACCTACAGCG TGAGCATTGA 6 7 50 

45 

GAAAGCGCCA CGCTTCCCGA AGGGAGAAAG GCGGACAGGT ATCCGGTAAG 680 0 
CGGCAGGGTC GGAACAGGAG AGCGCACGAG GGAGCTTCCA GGGGGAAACG 68 5 0 

50 

CCTGGTATCT TTATAGTCCT GTCGGGTTTC GCCACCTCTG ACTTGAGCGT 6 9 00 

5 5 CGATTTTTGT GATGCTCGTC AGGGGGGCGG AGCCTATGGA AAAACGCCAG 6 95 0 

CAACGCGGCC TTTTTACGGT TCCTGGCCTT TTGCTGGCCT TTTGCTCACA 7 00 0 

60 

TGTTCTTTCC TGCGTTATCC CCTGATTCTG TGGATAACCG TATTACCGCC 7 05 0 
TTTGAGTGAG CTGATACCGC TCGCCGCAGC CGAACGACCG AGCGCAGCGA 7100 

65 

GTCAGTGAGC GAGGAAGCGG AAGAGCGCCC AATACGCAAA CCGCCTCTCC 7150 



5 
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CCGCGCGTTG GCCGATTCAT TAATCCAGCT GGCACGACAG GTTTCCCGAC 72 00 



TGGAAAGCGG GCAGTG AG CG CAACGCAATT AATGTGAGTT ACCTCACTCA 72 50 

5 

TTAGGCACCC CAGGCTTTAC ACTTTATGCT TCCGGCTCGT ATGTTGTGTG 73 0 0 

1 0 GAATTGTGAG CGGATAACAA TTTCACACAG GAAACAGCTA TG AC CATG AT 7 3 5 0 

TACGAATTAA 736 0 

15 

(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 
20 (A) LENGTH: 6889 bases 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : double 
(D) TOPOLOGY: linear 

2 5 (xi) SEQUENCE DESCRIPTION : SEQ ID NO : 2 : 

TTCGAGCTCG CCCGACATTG ATTATTGACT AGTTATTAAT AGTAATCAAT 50 

30 

TACGGGGTCA TTAGTTCATA G C C CAT AT AT GGAGTTCCGC GTTACATAAC 100 

TTACGGTAAA TGGCCCGCCT GGCTGACCGC CCAACGACCC CCGCCCATTG 150 

35 

ACGTCAATAA TGACGTATGT TC C CAT AGT A ACGCCAATAG GGACTTTCCA 200 



4 0 TTGACGTCAA TGGGTGGAGT ATTTACGGTA AACTGCCCAC TTGGCAGTAC 2 50 
ATCAAGTGTA TCATATGCCA AGTACGCCCC CTATTGACGT CAATGACGGT 3 00 

45 

AAATGGCCCG CCTGGCATTA TGCCC AGTAC ATGACCTTAT GGGACTTTCC 3 50 
TACTTGGCAG TACATCTACG TATTAGTCAT CGCTATTACC ATGGTGATGC 4 00 

50 

GGTTTTGGCA GTACATCAAT GGGCGTGGAT AGCGGTTTGA CTCACGGGGA 4 50 



5 5 TTTCCAAGTC TCCACCCCAT TGACGTCAAT GGGAGTTTGT TTTGGCACCA 5 00 
AAATCAACGG GACTTTCCAA AATGTCGTAA CAACTCCGCC CCATTGACGC 5 50 

60 

AAATGGGCGG TAGGCGTGTA CGGTGGGAGG TCTATATAAG CAGAGCTCGT 6 00 
TTAGTGAACC GTCAGATCGC CTGGAGACGC CATCCACGCT GTTTTGACCT 6 50 

65 

CCATAGAAGA CACCGGGACC GATCCAGCCT CCGCGGCCGG G AACGGTG C A 7 00 
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TTGGAACGCG GATTCCCCGT GCCAAGAGTG CTGTAAGTAC CGCCTATAGA 750 
GCGATAAGAG GATTTTATCC CCGCTGCCAT CATGGTTCGA CCATTGAACT 8 00 

5 

GCATCGTCGC CGTGTCCCAA AATATGGGGA TTGGCAAGAA CGG AG AC CTA 850 



AACCTCTTCA GTGGAAGGTA AACAGAATCT GGTGATTATG GGTAGGAAAA 95 0 

15 

CCTGGTTCTC CATTCCTGAG AAGAATCGAC CTTTAAAGGA CAGAATTAAT 1000 
ATAGTTCTCA GTAGAGAACT CAAAGAACCA CCACGAGGAG CTCATTTTCT 105 0 

20 

TGCCAAAAGT TTGGATGATG CCTTAAGACT TATTGAACAA CCGGAATTGG 110 0 
2 5 CAAGTAAAGT AGACATGGTT TGGATAGTCG GAGGCAGTTC TGTTTACCAG 115 0 
GAAGCCATGA ATCAACCAGG C C AC CTT AG A CTCTTTGTGA CAAGGATCAT 12 0 0 

30 

GCAGGAATTT GAAAGTGACA CGTTTTTCCC AGAAATTGAT TTGGGGAAAT 12 5 0 
ATAAACCTCT CCCAGAATAC CCAGGCGTCC TCTCTGAGGT CCAGGAGGAA 13 0 0 

35 

AAAGG CAT C A AGTATAAGTT TGAAGTCTAC GAGAAGAAAG ACTAACAGGA 13 5 0 

4 0 AGATGCTTTC AAGTTCTCTG CTCCCCTCCT AAAGCTATGC ATTTTTATAA 14 0 0 

GACCATGGGA CTTTTGCTGG CTTTAGACCC CCTTGGCTTC GTTAGAACGC 14 50 

45 

GGCTACAATT AATACATAAC CTTATGTATC ATACACATAG ATTTAGGTGA 1500 
CACTATAGAA TAACATCCAC TTTGCCTTTC TCTC CACAGG TGTCACTCCA 1550 

50 

GGTCAACTGC ACCTCGGTTC TATCGATTGA ATTCCCCGGC CATAGCTGTC 16 0 0 

5 5 TGGCATGGGC CTCTCCACCG TGCCTGACCT GCTGCTGCCG CTGGTGCTCC 16 5 0 

TGGAGCTGTT GGTGGGAATA TACCCCTCAG GGGTTATTGG ACTGGTCCCT 1700 

60 

CACCTAGGGG ACAGGGAGAA GAGAGATAGT GTGTGTCCCC AAGGAAAATA 17 50 
TATCCACCCT CAAAATAATT CGATTTGCTG TACCAAGTGC CACAAAGGAA 18 00 

65 

CCTACTTGTA CAATGACTGT CCAGGCCCGG GG CAGG AT AC GGACTGCAGG 18 5 0 



10 



CCCTGCCCTC CGCTCAGGAA CGCGTTCAAG TACTTCCAAA 



GAATGACCAC 90 0 
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GAGTGTGAGA GCGGCTCCTT CACCGCTTCA GAAAACCACC TCAGACACTG 190 0 
CCTCAGCTGC TCCAAATGCC GAAAGGAAAT GGGTCAGGTG GAGATCTCTT 195 0 

5 

CTTG C ACAGT GGACCGGG AC ACCGTGTGTG GCTGCAGGAA GAACCAGTAC 2 00 0 
10 CGGCATTATT GGAGTGAAAA CCTTTTCCAG TGCTTCAATT GCAGCCTCTG 2050 
CCTCAATGGG ACCGTGCACC TCTCCTGCCA GGAGAAACAG AACACCGTGT 2100 
GCACCTGCCA TGCAGGTTTC TTTCTAAGAG AAAACGAGTG TGTCTCCTGT 2150 
AGTAACTGTA AGAAAAGCCT GGAGTGCACG AAGTTGTGCC TACCCCAGAT 2 20 0 
TGAGAATGTT AAGGGCACTG AGGACTCAGG CACCACAGAC AAGAGAGTTG 22 50 
2 5 AG CT CAAAAC CCCACTTGGT GACACAACTC ACACATGCCC ACGGTGCCCA 2 3 00 
GAGCCCAAAT CTTGTGACAC ACCTCCCCCG TGCCCACGGT GCCCAGAGCC 2 3 50 
CAAATCTTGT GACACACCTC CCCCATGCCC ACGGTGCCCA GAGCCCAAAT 24 00 
CTTGTGACAC ACCTCCCCCA TGCCCACGGT GCCCAGCACC TGAACTCCTG 2450 
GGAGGACCGT CAGTCTTCCT CTTCCCCCCA AAACCCAAGG ATACCCTTAT 2 500 

4 0 GATTTCCCGG ACCCCTGAGG TCACGTGCGT GGTGGTGGAC GTGAGCCACG 2 5 50 

AAGACCCCGA GGTCCAGTTC AAGTGGTACG TGGACGGCGT GGAGGTG CAT 26 00 
AATGCCAAGA CAAAGCCGCG GGAGGAGCAG TTCAACAGCA CGTTCCGTGT 26 50 
GGTCAGCGTC CTCACCGTCC TGCACCAGGA CTGGCTGAAC GGCAAGGAGT 2 700 
ACAAGTGCAA GGTCTCCAAC AAAGCCCTCC CAGCCCCCAT CGAGAAAACC 2 75 0 

5 5 ATCTCCAAAA CCAAAGGACA GCCCCGAGAA CCACAGGTGT ACACCCTGCC 28 00 

CCCATCCCGG GAGGAGATGA CCAAGAACCA GGTCAGCCTG ACCTGCCTGG 2 8 50 
TCAAAGGCTT CTACCCCAGC GACATCGCCG TGGAGTGGGA GAGCAGCGGG 2 90 0 
CAGCCGGAGA ACAACTACAA CACCACGCCT CCCATGCTGG ACTCCGACGG 2 95 0 
CTCCTTCTTC CTCTACAGCA AGCTCACCGT GGACAAGAGC AGGTGGCAGC 3 00 0 
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AGGGGAACAT CTTCTCATG C TCCGTGATGC ATGAGGCTCT GCACAACCGC 3 05 0 
TTCACGCAGA AGAGCCTCTC CCTGTCTCCG GGTAAATGAG TGCGACGGCC 3100 
GGGGATCCTC TAGAGTCGAC CTGCAGAAGC TTGGCCGCCA TGGCCCAACT 315 0 
TGTTTATTGC AGCTTATAAT GGTTACAAAT AAAGCAATAG CATCACAAAT 32 0 0 
TTCACAAATA AAGCATTTTT TTCACTGCAT TCTAGTTGTG GTTTGTCCAA 32 5 0 
ACT CATCAAT GTATCTTATC ATGTCTGGAT CGATCGGGAA TTAATTCGGC 3 300 
GCAGCACCAT GGCCTGAAAT AACCTCTGAA AGAGGAACTT GGTTAGGTAC 33 50 
CTTCTGAGGC GGAAAGAACC AGCTGTGGAA TGTGTGTCAG TTAGGGTGTG 34 0 0 
GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA GTATGCAAAG CATGCATCTC 34 50 
AATTAGTCAG CAACCAGGTG TGGAAAGTCC CCAGGCTCCC C AG C AG G C AG 3 50 0 
AAGTATGCAA AGCATGCATC TCAATTAGTC AGCAACCATA GTCCCGCCCC 35 50 
TAACTCCGCC CATCCCGCCC CTAACTCCGC CCAGTTCCGC CCATTCTCCG 36 0 0 
CCCCATGGCT GACTAATTTT TTTTATTTAT GCAGAGGCCG AGGCCGCCTC 3 6 50 
GGCCTCTGAG CTATTCCAGA AGTAGTGAGG AGGCTTTTTT GGAGGCCTAG 3 7 00 
GCTTTTGCAA AAAGCTGTTA ACAGCTTGGC ACTGGCCGTC GTTTTACAAC 3 750 
GTCGTGACTG GGAAAACCCT GGCGTTACCC AACTTAATCG CCTTGCAGCA 38 00 
CATCCCCCCT TCGCCAGCTG GCGTAATAGC GAAGAGGCCC GCACCGATCG 38 50 
CCCTTCCCAA CAGTTGCGTA GCCTGAATGG CGAATGGCGC CTGATGCGGT 3 900 
ATTTTCTCCT TACGCATCTG TGCGGTATTT CACACCGCAT ACGTCAAAGC 3 95 0 
AACCATAGTA CGCGCCCTGT AGCGGCGCAT TAAGCGCGGC GGGTGTGGTG 4 000 
GTTACGCGCA GCGTGACCGC TACACTTGCC AGCGCCCTAG CGCCCGCTCC 4 0 50 
TTTCGCTTTC TTCCCTTCCT TTCTCGCCAC GTTCGCCGGC TTTCCCCGTC 4100 
AAG CTCT AAA TCGGGGGCTC CCTTTAGGGT TCCGATTTAG TGCTTTACGG 415 0 
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CACCTCGACC CCAAAAAACT TGATTTGGGT GATGGTTCAC GTAGTGGGCC 42 00 
ATCGCCCTGA TAGACGGTTT TTCGCCCTTT GACGTTGGAG TCCACGTTCT 42 50 
TTAATAGTGG ACTCTTGTTC CAAACTGGAA CAACACTCAA CCCTATCTCG 43 00 
GGCTATTCTT TTGATTTATA AGGGATTTTG CCGATTTCGG CCTATTGGTT 4350 
AAAAAATGAG CTGATTTAAC AAAAATTTAA CGCGAATTTT AACAAAATAT 44 0 0 
TAACGTTTAC AATTTTATGG TGCACTCTCA GTACAATCTG CTCTGATGCC 44 5 0 
GCATAGTTAA GCCAACTCCG CTATCGCTAC GTGACTGGGT CATGGCTGCG 4 50 0 
CCCCGACACC CGCCAACACC CGCTGACGCG CCCTGACGGG CTTGTCTGCT 4 550 
CCCGGCATCC G CTT ACAG AC AAGCTGTGAC CGTCTCCGGG AGCTGCATGT 4 6 00 
GTCAGAGGTT TTCACCGTCA TCACCGAAAC GCGCGAGGCA GTATTCTTGA 4 6 50 
AGACGAAAGG GCCTCGTGAT ACGCCTATTT TTATAGGTTA ATGTCATGAT 4 700 
AATAATGGTT TCTTAGACGT CAGGTGGCAC TTTTCGGGGA AATGTGCGCG 4 750 
GAACCCCTAT TTGTTTATTT TTCTAAATAC ATTCAAATAT GTATCCGCTC 4 800 
ATGAGACAAT AACCCTGATA AATGCTTCAA TAATATTGAA AAAGGAAGAG 4 850 
TATGAGTATT CAACATTTCC GTGTCGCCCT TATTCCCTTT TTTGCGGCAT 4 90 0 
TTTGCCTTCC TGTTTTTGCT CACCCAGAAA CGCTGGTGAA AG TAAAAG AT 4 95 0 
GCTGAAGATC AGTTGGGTGC ACGAGTGGGT TACATCGAAC TGGATCTCAA 50 0 0 
CAGCGGTAAG ATCCTTGAGA GTTTTCGCCC CGAAGAACGT TTTCCAATGA 505 0 
TGAGCACTTT TAAAGTTCTG CTATGTGGCG C G G T ATT AT C CCGTGATGAC 5100 
GCCGGGCAAG AGCAACTCGG TCGCCGCATA CACTATTCTC AGAATGACTT 5150 
GGTTGAGTAC TCACCAGTCA CAGAAAAGCA TCTTACGGAT GGCATGACAG 52 00 
TAAGAGAATT ATGCAGTGCT GCCATAACCA TGAGTGATAA CACTGCGGCC 52 50 
AACTTACTTC TGACAACGAT CGGAGGACCG AAGGAGCTAA CCGCTTTTTT 53 00 
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GCACAACATG GGGGATCATG TAACTCGCCT TGATCGTTGG GAACCGGAGC 5 350 
TGAATGAAGC CATACCAAAC GACGAGCGTG ACACCACGAT G C C AG C AG C A 54 0 0 



ATGGCAACAA CGTTGCGCAA ACTATTAACT GGCGAACTAC TTACTCTAGC 54 5 0 
10 TTCCCGGCAA CAATTAATAG ACTGGATGGA GG C GGATAAA GTTGCAGGAC 5 50 0 
CACTTCTGCG CTCGGCCCTT CCGGCTGGCT GGTTTATTGC TGATAAATCT 5 550 

15 

GGAGCCGGTG AGCGTGGGTC TCGCGGTATC ATTGCAGCAC TGGGGCCAGA 56 00 
TGGTAAGCCC TCCCGTATCG TAGTTATCTA CACGACGGGG AGT CAGGCAA 56 5 0 

20 

CTATGGATGA ACGAAATAGA CAGATCGCTG AGATAGGTGC CTCACTGATT 5 7 00 
2 5 AAGCATTGGT AACTGTCAGA CCAAGTTTAC TCATATATAC TTTAGATTGA 5 7 50 
TTTAAAACTT CATTTTTAAT TTAAAAGGAT CTAGGTGAAG ATCCTTTTTG 5800 

30 

ATAATCTCAT GACCAAAATC CCTTAACGTG AGTTTTCGTT CCACTGAGCG 58 5 0 
TCAGACCCCG TAGAAAAGAT CAAAGGATCT TCTTGAGATC CTTTTTTTCT 5 90 0 

35 

GCGCGTAATC TGCTGCTTGC AAACAAAAAA ACCACCGCTA CCAGCGGTGG 5950 

4 0 TTTGTTTGCC GGATCAAGAG CTACCAACTC TTTTTCCGAA GGTAACTGGC 6 000 

TTCAGCAGAG CGCAGATACC AAATACTGTC CTTCTAGTGT AGCCGTAGTT 6 05 0 

45 

AGGCCACCAC TTCAAGAACT CTGTAGCACC GCCTACATAC CTCGCTCTGC 610 0 
TAATCCTGTT ACCAGTGGCT GCTGCCAGTG GCGATAAGTC GTGTCTTACC 6150 

50 

GGGTTGGACT CAAGACGATA GTTACCGGAT AAGGCGCAGC GGTCGGGCTG 6 200 

5 5 AACGGGGGGT TCGTGCACAC AGCCCAGCTT GGAGCGAACG ACCTACACCG 62 50 

AACTGAGATA CCTACAGCGT GAGCATTGAG AAAGCGCCAC GCTTCCCGAA 6 300 

60 

GGGAGAAAGG CGGACAGGTA TCCGGTAAGC GGCAGGGTCG GAACAGGAGA 63 50 
GCGCACGAGG GAGCTTCCAG GGGGAAACGC CTGGTATCTT TATAGTCCTG 64 00 

65 

TCGGGTTTCG CCACCTCTGA CTTGAGCGTC G ATTTTTG TG ATGCTCGTCA 64 50 



5 
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GGGGGGCGGA GCCTATGGAA AAACGCCAGC AACGCGGCCT TTTTACGGTT 6 50 0 
CCTGGCCTTT TGCTGGCCTT TTGCTCACAT GTTCTTTCCT GCGTTATCCC 6 55 0 

5 

CTGATTCTGT GGATAACCGT ATTACCGC CT TTGAGTGAGC TG AT AC CG C T 6 6 00 
10 CGCCGCAGCC GAACGACCGA GCGCAGCGAG TCAGTGAGCG AGGAAGCGGA 6 65 0 
AGAGCGCCCA ATACGCAAAC CGCCTCTCCC CGCGCGTTGG C CG ATT C ATT 6 700 
AATCCAGCTG GCACGACAGG TTTCCCGACT GGAAAGCGGG CAGTGAG CGC 6 750 
AACGCAATTA ATGTGAGTTA CCTCACTCAT TAGGCACCCC AGGCTTTACA 6 8 00 
CTTTATGCTT CCGGCTCGTA TGTTGTGTGG AATTGTGAGC GGATAACAAT 6 8 5 0 
2 5 TTCACACAGG AAACAGCTAT GACCATGATT ACGAATTAA 6 88 9 
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(2) INFORMATION FOR SEQ ID NO : 3 



( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 6557 bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

3 5 (D) TOPOLOGY: linear 

(xi ) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

4 0 TTCGAGCTCG CCCGACATTG ATTATTGACT AGAGTCGATC GACAGCTGTG 50 

GAAT GTGTGT CAGTTAGGGT GTGGAAAGTC CCCAGGCTCC CCAGCAGGCA 100 

GAAGTATGCA AAGCATGCAT CTCAATTAGT C AG C AAC C AG GTGTGGAAAG 15 0 

TCCCCAGGCT CCCCAGCAGG CAGAAGTATG CAAAGCATGC ATCTCAATTA 2 00 

GTCAGCAACC ATAGTCCCGC CCCTAACTCC GCCCATCCCG CCCCTAACTC 2 50 

5 5 CGCCCAGTTC CGCCCATTCT CCGCCCCATG GCTGACTAAT TTTTTTTATT 3 00 

TATGCAGAGG CCGAGGCCGC CTCGGCCTCT GAG C T ATT C C AGAAGTAGTG 3 50 
AGGAGGCTTT TTTGGAGGCC TAGGCTTTTG CAAAAAGCTA GCTTATCCGG 4 00 
CCGGGAACGG TGCATTGGAA CGCGGATTCC CCGTGCCAAG AGTGACGTAA 4 50 
GTACCGCCTA TAGAGCGATA AGAGGATTTT ATCCCCGCTG CCATCATGGT 5 00 
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TCGACCATTG AACTGCATCG TCGCCGTGTC CCAAAATATG GGGATTGGCA 5 50 
AGAACGGAGA CCTACCCTGG CCTCCGCTCA GGAACGAGTT CAAGTACTTC 6 00 

5 

CAAAGAATGA CCACAACCTC TTCAGTGGAA GGTAAACAGA ATCTGGTGAT 650 
10 TATGGGTAGG AAAACCTGGT TCTCCATTCC TGAGAAGAAT CGACCTTTAA 70 0 
AGGACAGAAT TAATATAGTT CTCAGTAGAG AACTCAAAGA ACCACCACGA 75 0 
GGAGCTCATT TTCTTGCCAA AAGTTTGGAT GATGCCTTAA GACTTATTGA 80 0 
ACAACCGGAA TTGGCAAGTA AAGTAGACAT GGTTTGGATA GTCGGAGGCA 8 50 
GTTCTGTTTA CCAGGAAGCC ATGAATCAAC CAGGCCACCT TAGACTCTTT 90 0 
2 5 GTGACAAGGA TCATGCAGGA ATTTGAAAGT GACACGTTTT TCCCAG AAAT 950 
TGATTTGGGG AAATATAAAC CTCTCCCAGA ATACC CAGGC GTCCTCTCTG 100 0 
AGGTCCAGGA GGAAAAAGGC AT C AAG TATA AGTTTGAAGT CTACGAGAAG 105 0 
AAAGACTAAC AGGAAGATGC TTTCAAGTTC TCTGCTCCCC TCCTAAAGCT 1100 
ATG CATTTTT ATAAGACCAT GGGACTTTTG CTGG CTTT AG ATCCCCTTGG 1150 

4 0 CTTCGTTAGA ACGCAGCTAC AATTAATACA TAACCTTATG TATCATACAC 12 0 0 

ATACGATTTA GGTGACACTA TAGATAACAT CCACTTTGCC TTTCTCTCCA 12 5 0 
CAGGTGTCCA CTCCCAGGTC CAACTGCACC TCGGTTCTAT CGATTGAATT 13 00 
CCACCATGGG ATGGTCATGT AT CATC CTTT TT CT AG TAG C AACTGCAACT 13 50 
GGAGTACATT C AG AAG TT C A GCTGGTGGAG TCTGGCGGTG GCCTGGTGCA 14 0 0 

5 5 GCCAGGGGGC TCACTCCGTT TGTCCTGTGC AGTTTCTGGC TACT C CATC A 14 5 0 

CCTCCGG ATA TAG C TGG AAC TGGATCCGTC AGGCCCCGGG TAAGGGCCTG 15 00 
GAATGGGTTG CAT CG A TTAC GTATGCCGGA TCGACTAACT ATAACCCTAG 1550 
CGTCAAGGGC CGTATCACTA TAAGTCGCGA CGATTCCAAA AACACATTCT 16 0 0 
ACCTGCAGAT G AAC AG C CT G CGTGCTGAGG ACACTGCCGT CTATTATTGT 16 5 0 
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GCTCGAGGCA GCCACTATTT CGGCGCCTGG CACTTCGCCG TGTGGGGTCA 17 00 
AGGAACCCTG GTCACCGTCT CCTCGGCCTC CACCAAGGGC CCATCGGTCT 17 50 

5 

TCCCCCTGGC ACCCTCCTCC AAGAGCACCT CTGGGGGCAC AGCGGCCCTG 18 0 0 
10 GGCTGCCTGG TCAAGGACTA CTTCCCCGAA CCGGTGACGG TGTCGTGGAA 18 5 0 
CTCAGGCGCC CTGACCAGCG GCGTGCACAC CTTCCCGGCT GTCCTACAGT 190 0 
CCTCAGGACT CTACTCCCTC AGCAGCGTGG TGACTGTGCC CTCTAGCAGC 195 0 
TTGGGCACCC AGACCTACAT CTGCAACGTG AATCACAAGC CCAGCAACAC 20 0 0 
CAAGGTGGAC AAGAAAGTTG AGCCCAAATC TTGTGACAAA ACTCACACAT 20 5 0 
2 5 GCCCACCGTG CCCAGCACCT GAACTC CTGG GGGGACCGTC AGTCTTCCTC 2100 
TTCCCCCCAA AACCCAAGGA CACCCTCATG ATCTCCCGGA CCCCTGAGGT 2150 
CACATGCGTG GTGGTGGACG TGAGCCACGA AGACCCTGAG GTCAAGTTCA 22 0 0 
ACTGGTACGT GGACGGCGTG GAGGTGCATA ATGCCAAGAC AAAGCCGCGG 22 5 0 
GAGGAGCAGT ACAACAGCAC GTACCGTGTG GTCAGCGTCC TCACCGTCCT 2 300 

4 0 G C AC CAGG AC TGGCTGAATG GCAAGGAGTA CAAGTGCAAG GTCTCCAACA 2 3 50 

AAGCCCTCCC AGCCCCCATC GAGAAAACCA TCTCCAAAGC CAAAGGGCAG 24 0 0 
CCCCGAGAAC CACAGGTGTA CACCCTGCCC CCATCCCGGG AAGAGATGAC 24 50 
CAAGAACCAG GTCAGCCTGA CCTGCCTGGT CAAAGGCTTC TATCCCAGCG 2 500 
ACATCGCCGT GGAGTGGGAG AGCAATGGGC AGCCGGAGAA CAACTACAAG 2 5 50 

5 5 ACCACGCCTC CCGTGCTGGA CTCCGACGGC TCCTTCTTCC TCTACAGCAA 2 600 

GCTCACCGTG GACAAGAGCA GGTGGCAGCA GGGGAACGTC TTCTCATGCT 26 5 0 
CCGTGATGCA TGAGGCTCTG CACAACCACT ACACGCAGAA GAGCCTCTCC 2 7 00 
CTGTCTCCGG GTAAATGAGT GCGACGGCCC TAGAGTCGAC CTGCAGAAGC 2750 
TTGGCCGCCA TGGCCCAACT TGTTTATTGC AGCTTATAAT GGTTACAAAT 2800 
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AAAGCAATAG CAT CACAAAT TTCACAAATA AAGCATTTTT TT C AC TG CAT 28 5 0 
TCTAGTTGTG GTTTGTCCAA ACTCATCAAT GTATCTTATC ATGTCTGGAT 2 9 00 



CGATCGGGAA TTAATTCGGC GCAGCACCAT GGCCTGAAAT AACCTCTGAA 2 950 
10 AGAGGAACTT GGTTAGGTAC CTTCTGAGGC GGAAAGAACC AGCTGTGGAA 3 000 
TGTGTGTCAG TTAGGGTGTG GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA 3 0 50 

15 

GTATGCAAAG CATGCATCTC AATTAGTCAG CAACCAGGTG TGGAAAGTCC 3100 
CCAGGCTCCC CAGCAGGCAG AAGTATGCAA AGCATGCATC TCAATTAGTC 3150 

20 

AG C AAC CAT A GTCCCGCCCC TAACTCCGCC CATCCCGCCC CTAACTCCGC 3 2 00 
2 5 CCAGTTCCGC CCATTCTCCG CCCCATGGCT GACTAATTTT TTTTATTTAT 3 2 50 
GCAGAGGCCG AGGCCGCCTC GGCCTCTGAG CTATTCCAGA AGTAGTGAGG 3 3 00 

30 

AGGCTTTTTT GGAGGCCTAG GCTTTTGCAA AAAGCTGTTA CCTCGAGCGG 3 3 50 
CCGCTTAATT AAGGCGCGCC ATTTAAATCC TG C AGGTAAC AG CTTGGCAC 34 00 

35 

TGGCCGTCGT TTTACAACGT CGTGACTGGG AAAACCCTGG CGTTACCCAA 3 4 50 

4 0 CTTAATCGCC TTGCAGCACA TCCCCCCTTC GCCAGCTGGC GTAATAGCGA 3 500 

AGAGGCCCGC ACCGATCGCC CTTCCCAACA GTTGCGTAGC CTGAATGGCG 3 5 50 

45 

AATGGCGCCT GATGCGGTAT TTTCTCCTTA CGCATCTGTG CGGTATTTCA 360 0 
CACCGCATAC GTCAAAGCAA CCATAGTACG CGCCCTGTAG C GG CG C ATT A 3650 

50 

AGCGCGGCGG GTGTGGTGGT TACGCGCAGC GTGACCGCTA CACTTGCCAG 3 70 0 

5 5 CGCCCTAGCG CCCGCTCCTT TCGCTTTCTT CCCTTCCTTT CTCGCCACGT 3 7 50 

TCGCCGGCTT TCCCCGTCAA GCTCTAAATC GGGGGCTCCC TTTAGGGTTC 3 8 00 

60 

CGATTTAGTG CTTTACGG C A CCTCGACCCC AAAAAACTTG ATTTGGGTGA 3 8 50 
TGGTTCACGT AGTGGGCCAT CGCCCTGATA GACGGTTTTT CGCCCTTTGA 3 9 00 

65 

CGTTGGAGTC CACGTTCTTT AATAGTGGAC TCTTGTTCCA AACTGGAACA 3 95 0 
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ACACTCAACC CTATCTCGGG CTATTCTTTT GATTTATAAG GGATTTTGCC 4 0 00 



GATTTCGGCC TATTGGTTAA AAAATG AG C T GATTTAACAA AAATTTAACG 4 05 0 

5 

CGAATTTTAA CAAAATATTA ACGTTTACAA TTTTATGGTG CACTCTCAGT 4100 
10 ACAATCTGCT CTGATGCCGC ATAGTTAAGC CAACTCCGCT ATCGCTACGT 4150 
GACTGGGTCA TGGCTGCGCC CCGACACCCG CCAACACCCG CTGACGCGCC 42 00 

15 

CTGACGGGCT TGTCTGCTCC CGGCATCCGC TTACAGACAA GCTGTGACCG 42 50 
TCTCCGGGAG CTGCATGTGT CAGAGGTTTT CACCGTCATC ACCGAAACGC 4 3 00 

20 

GCGAGGCAGT ATTCTTGAAG ACGAAAGGGC CTCGTGATAC GCCTATTTTT 4 3 50 
2 5 AT AGG TTAAT GTCATGATAA TAATGGTTTC TTAGACGTCA GGTGGCACTT 44 00 
TTCGGGGAAA TGTGCGCGGA ACCCCTATTT GTTTATTTTT CTAAATACAT 44 5 0 

30 

TCAAATATGT ATCCGCTCAT GAGACAATAA CCCTGATAAA TGCTTCAATA 4 5 00 
ATATTGAAAA AGGAAGAGTA TGAGTATTCA ACATTTCCGT GTCGCCCTTA 4 5 50 

35 

TTCCCTTTTT TGCGGCATTT TGCCTTCCTG TTTTTGCTCA CCCAGAAACG 46 0 0 

4 0 CTGGTGAAAG TAAAAGATGC TGAAGATCAG TTGGGTGCAC GAGTGGGTTA 46 50 

CATCGAACTG GATCTCAACA GCGGTAAGAT CCTTGAGAGT TTTCGCCCCG 4 700 

45 

AAGAACGTTT TCCAATGATG AG CAC TTTTA AAGTTCTGCT ATGTGGCGCG 4 750 
GTATTATCCC — GATGACGC CGGGCAAGAG CAACTCGGTC GCCGCATACA 48 00 

50 

CTATTCTCAG AATGACTTGG TTGAGTACTC ACCAGTCACA GAAAAGCATC 4 8 50 

5 5 TTACGGATGG CATGACAGTA AGAGAATTAT GCAGTGCTGC CATAACCATG 4 90 0 

AGTGATAACA CTGCGGCCAA CTTACTTCTG ACAACGATCG G AGG AC CG AA 4 95 0 

60 

GGAGCTAACC GCTTTTTTGC ACAACATGGG GGATCATGTA ACTCGCCTTG 5 00 0 
ATCG TTGGG A ACCGGAGCTG AATGAAGCCA T AC C AAACGA CGAGCGTGAC 5050 

65 

ACCACGATGC CAGCAGCAAT GGCAACAACG TTGCGCAAAC TATTAACTGG 5100 
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CGAACTACTT ACTCTAGCTT CCCGGCAACA ATTAATAGAC TGGATGGAGG 515 0 
CGGATAAAGT TGCAGGACCA CTTCTGCGCT CGGCCCTTCC GGCTGGCTGG 5200 
TTTATTGCTG ATAAATCTGG AGCCGGTGAG CGTGGGTCTC GCGGTATCAT 52 50 
TGCAGCACTG GGGCCAGATG GTAAGCCCTC CCGTATCGTA GTTATCTACA 53 00 
CGACGGGGAG TCAGGCAACT ATGGATGAAC GAAATAGACA GATCGCTGAG 5 350 
ATAGGTGCCT CACTGATTAA GCATTGGTAA CTGTCAGACC AAGTTTACTC 5 4 00 
ATATATACTT TAGATTGATT TAAAACTTCA TTTTTAATTT AAAAGGATCT 54 5 0 
AGGTGAAGAT CCTTTTTGAT AATCTCATGA CCAAAATCCC TTAACGTGAG 5 50 0 
TTTTCGTTCC ACTGAGCGTC AGACCCCGTA G AAAAGAT C A AAGGATCTTC 5 55 0 
TTGAGATCCT TTTTTTCTGC GCGTAATCTG CTGCTTGCAA ACAAAAAAAC 56 00 
CACCGCTACC AGCGGTGGTT TGTTTGCCGG ATCAAGAGCT ACCAACTCTT 56 5 0 
TTTCCGAAGG TAACTGGCTT CAGCAGAGCG CAGATACCAA ATACTGTCCT 5700 
TCTAGTGTAG CCGTAGTTAG GCCACCACTT CAAGAACTCT GTAGCACCGC 5 75 0 
CTACATACCT CGCTCTGCTA ATCCTGTTAC CAGTGGCTGC TGCCAGTGGC 5 800 
GATAAGTCGT GTCTTACCGG GTTGGACTCA AGACGATAGT TACCGGATAA 5 850 
GGCGCAGCGG TCGGGCTGAA CGGGGGGTTC GTGCACACAG CCCAGCTTGG 5 900 
AGCGAACGAC CTACACCGAA CTGAGATACC TACAGCGTGA GCATTGAGAA 5 950 
AGCGCCACGC TTCCCGAAGG GAGAAAGGCG G AC AG GTATC CGGTAAGCGG 6 00 0 
CAGGGTCGGA ACAGGAGAGC GCACGAGGGA GCTTCCAGGG GGAAACGCCT 6 05 0 
GGTATCTTTA TAGTCCTGTC GGGTTTCGCC ACCTCTGACT TGAGCGTCGA 610 0 
TTTTTGTGAT GCTCGTCAGG GGGGCGGAGC CTATGGAAAA ACGCCAGCAA 615 0 
CGCGGCCTTT TTACGGTTCC TGGCCTTTTG CTGGCCTTTT GCTCACATGT 6 2 00 
TCTTTCCTGC GTTATCCCCT GATTCTGTGG ATAACCGTAT TACCGCCTTT 62 50 
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GAGTGAGCTG ATACCGCTCG CCGCAGCCGA ACGACCGAGC GCAGCGAGTC 6 30 0 
AGTGAGCGAG GAAGCGGAAG AGCGCCCAAT ACGCAAACCG CCTCTCCCCG 6 3 50 

5 

CGCGTTGGCC G ATT C ATT AA TCCAGCTGGC ACGACAGGTT TCCCGACTGG 64 0 0 
10 AAAGCGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTACC TCACTCATTA 6 4 50 
GGCACCCCAG GCTTTACACT TTATGCTTCC GGCTCGTATG TTGTGTGGAA 6 500 

15 

TTGTGAGCGG ATAACAATTT CACACAGGAA ACAGCTATGA CCATGATTAC 6 55 0 
GAATTAA 6 55 7 

20 

(2) INFORMATION FOR SEQ ID NO : 4 : 

2 5 d) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 73 05 bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

30 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 
TTCGAGCTCG CCCGACATTG ATTATTGACT AGTTATTAAT AGTAATCAAT 5 0 

35 

TACGGGGTCA TTAGTTCATA G C C CAT AT AT GGAGTTCCGC GTTACATAAC 100 

4 0 TTACGGTAAA TGGCCCGCCT GGCTGACCGC CCAACGACCC CCGCCCATTG 15 0 

ACGTCAATAA TGACGTATGT TCCCATAGTA ACGCCAATAG GGACTTTCCA 200 
TTGACGTCAA TGGGTGGAGT ATTTACGGTA AACTGCCCAC TTGGCAGTAC 2 50 
ATCAAGTGTA TCATATGCCA AGTACGCCCC CTATTGACGT CAATGACGGT 3 00 
AAATGGCCCG CCTGGCATTA TGCCCAGTAC ATG AC CTT AT GGGACTTTCC 3 50 

5 5 TACTTGGCAG TACATCTACG TATTAGTCAT CGCTATTACC ATGGTGATGC 40 0 

GGTTTTGGCA GTACATCAAT GGGCGTGGAT AGCGGTTTGA CTCACGGGGA 450 
TTTCCAAGTC TCCACCCCAT TGACGTCAAT GGGAGTTTGT TTTGGCACCA 50 0 
AAATCAACGG GACTTTCCAA AATGTCGTAA CAACTCCGCC CCATTGACGC 55 0 
AAATGGGCGG TAGGCGTGTA CGGTGGGAGG TCTATATAAG CAGAGCTCGT 6 00 
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TTAGTGAACC GTCAGATCGC CTGGAGACGC CATCCACGCT GTTTTGACCT 6 50 
C C AT AG AAG A CACCGGGACC GATCCAGCCT CCGCGGCCGG GAACGGTGCA 7 00 

5 

TTGGAACGCG GATTCCCCGT GCCAAGAGTG ACGTAAGTAC CGCC TAT AG A 750 
10 GTCTATAGGC CCACCCCCTT GGCTTCGTTA GAACGCGGCT ACAATTAATA 8 00 
CATAACCTTA TGTATCATAC ACATACGATT TAGGTGACAC TATAGAATAA 8 50 
CATCCACTTT GCCTTTCTCT CCACAGGTGT CCACTCCCAG GTCCAACTGC 90 0 
ACCTCGGTTC TAAGCTTATC GATATGAAAA AGCCTGAACT CACCGCGACG 950 
TCTGTCGAGA AGTTTCTGAT CGAAAAGTTC GACAGCGTCT CCGACCTGAT 100 0 
2 5 GCAGCTCTCG GAGGGCGAAG AATCTCGTGC TTTCAGCTTC GATGTAGGAG 1050 
GGCGTGGATA TGTCCTGCGG GTAAATAGCT GCGCCGATGG TTTCTACAAA 1100 
GATCGTTATG TTTATCGGCA CTTTGCATCG GCCGCGCTCC CGATTCCGGA 1150 
AGTGCTTGAC ATTGGGGAAT TCAGCGAGAG CCTGACCTAT TGCATCTCCC 12 00 
GCCGTGCACA GGGTGTCACG TTGCAACACC TGCCTGAAAC CGAACTGCCC 12 50 

4 0 GCTGTTCTGC AGCCGGTCGC GGAGGCCATG GATGCGATCG CTGCGGCCGA 13 00 

TCTTAGC CAG ACGAGCGGGT TCGGCCCATT CGGACCGCAA GGAATCGGTC 13 50 
AATACACTAC ATGGCGTGAT TTCATATGCG CGATTGCTGA TCCCCATGTG 14 00 
TATCACTGGC AAACTGTGAT GGACGACACC GTCAGTGCGT CCGTCGCGCA 14 5 0 
GGCTCTCGAT GAGCTGATGC TTTGGGCCGA GGACTGCCCC GAAGTCCGGC 1500 

5 5 ACCTCGTGCA CGCGGATTTC GGCTCCAACA ATGTCCTGAC GGACAATGGC 15 50 

CGCATAACAG CGGTCATTGA CTGGAGCGAG GCGATGTTCG GGGATTCCCA 16 0 0 
ATACGAGGTC GCCAACATCT TCTTCTGGAG GCCGTGGTTG GCTTGTATGG 16 50 
AGCAGCAGAC GTACTTCGAG CGGAGGCATC CGGAGCTTGC AGGATCGCCG 17 0 0 
CGGCTCCGGG CGTATATGCT CCGCATTGGT CTTGACCAAC TCTATCAGAG 1750 
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CTTGGTTGAC GGCAATTTCG ATGATGCAGC TTGGGCGCAG GGTCGATGCG 180 0 
ACGCAATCGT CCGATCCGGA GCCGGGACTG TCGGGCGTAC ACAAATCGCC 18 5 0 

5 

CGCAGAAGCG CGGCCGTCTG GACCGATGGC TGTGTAGAAG TACTCGCCGA 190 0 
10 TAGTGGAAAC CGACGCCCCA GCACTCGTCC GAGGGCAAAG GAATAGAGTA 1950 
GATGCCGACC GAAGGATCCC C GG GGAATTC AATCGATGGC CGCCATGGCC 2 00 0 
CAACTTGTTT ATTGCAGCTT ATAATGGTTA CAAATAAAGC AAT AG CAT C A 2 050 
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CAAATTTCAC AAATAAAGCA TTTTTTTCAC TGCATTCTAG TTGTGGTTTG 2100 



TCCAAACTCA TCAATGTATC TTATCATGTC TGGATCGATC GGGAATTAAT 2150 
2 5 TCGGCGCAGC ACCATGGCCT GAAATAACCT CTGAAAGAGG AACTTGGTTA 22 0 0 
GGTACCTTCT GAGGCGGAAA GAACCAGCTG TGGAATGTGT GTCAGTTAGG 2250 
GTGTGGAAAG TCCCCAGGCT CCCCAGCAGG CAGAAGTATG CAAAGCATGC 23 00 
ATCTCAATTA GTCAGCAACC AGGTGTGGAA AGTCCCCAGG CTCCCCAGCA 23 50 
GG C AG AAGTA TGCAAAGCAT GCATCTCAAT TAGTCAGCAA CCATAGTCCC 24 0 0 

4 0 GCCCCTAACT CCGCCCATCC CGCCCCTAAC TCCGCCCAGT TCCGCCCATT 24 50 

CTCCGCCCCA TGGCTGACTA ATTTTTTTTA TTTATGCAGA GGCCGAGGCC 2 50 0 
GCCTCGGCCT CTGAGCTATT CCAGAAGTAG TGAGGAGGCT TTTTTGGAGG 2 5 50 
CCTAGGCTTT TGCAAAAAGC TAG CTT AT C C GGCCGGGAAC GGTGCATTGG 26 0 0 
AACGCGGATT CCCCGTGCCA AGAGTCAGGT AAGTACCGCC TATAGAGTCT 2 65 0 

5 5 ATAGGCCCAC CCCCTTGGCT TCGTTAGAAC GCGGCTACAA TTAATACATA 2 70 0 

ACCTTTTGGA TCGATCCTAC TGACACTGAC AT C C ACT TTT TCTTTTTCTC 2 75 0 
CACAGGTGTC CACTCCCAGG TCCAACTGCA CCTCGGTTCG CGAAGCTAGC 2 80 0 
TTGGGCTGCA TCGATTGAAT TCCACCATGG GATGGTCATG TATCATCCTT 2 8 50 
TTTCTAGTAG CAACTGCAAC TGGAGTACAT TCAGATATCC AGCTGACCCA 2 90 0 
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GTCCCCGAGC TCCCTGTCCG CCTCTGTGGG CGATAGGGTC ACCATCACCT 2 95 0 
GCCGTGCCAG TCAGAGCGTC GATTACGATG GTGATAGCTA CATGAACTGG 3 00 0 
TATCAACAGA AACCAGGAAA AGCTCCGAAA CTACTGATTT ACGCGGCCTC 3 05 0 
GTACCTGGAG TCTGGAGTCC CTTCTCGCTT CTCTGGATCC GGTTCTGGGA 310 0 
CGGATTTCAC TCTGACCATC AGCAGTCTGC AGC CGGAAG A CTTCGCAACT 315 0 
TATTACTGTC AGCAAAGTCA CGAGGATCCG TACACATTTG GACAGGGTAC 32 00 
CAAGGTGGAG ATCAAACGAA CTGTGGCTGC ACCATCTGTC TTCATCTTCC 325 0 
CGCCATCTGA TGAGCAGTTG AAATCTGGAA CTGCCTCTGT TGTGTGCCTG 330 0 
CTGAATAACT TCTATCCCAG AG AGG C C AAA GTACAGTGGA AGGTGGATAA 3 3 50 
CGCCCTCCAA TCGGGTAACT CCCAGGAGAG TGTCACAGAG CAGGACAGCA 34 0 0 
AGGACAGCAC CTACAGCCTC AGCAGCACCC TGACGCTGAG CAAAGCAGAC 34 5 0 
TACGAGAAAC ACAAAGTCTA CGCCTGCGAA GTCACCCATC AGGGCCTGAG 3 5 00 
CTCGCCCGTC ACAAAGAGCT TCAACAGGGG AGAGTGTTAA GCTTCGATGG 3 5 50 
CCGCCATGGC CCAACTTGTT TATTGCAGCT TATAATGGTT ACAAATAAAG 36 0 0 
CAATAGCATC ACAAATTTCA CAAATAAAGC ATTTTTTTCA CTGCATTCTA 36 50 
GTTGTGGTTT GTCCAAACTC ATCAATGTAT CTTATCATGT CTGGATCGAT 3 70 0 
CGGGAATTAA TTCGGCGCAG CACCATGGCC TGAAATAACC TCTGAAAGAG 3 75 0 
GAACTTGGTT AGGTACCTTC TGAGGCGGAA AGAACCAGCT GTGGAATGTG 380 0 
TGTCAGTTAG GGTGTGGAAA GTCCCCAGGC TCCCCAGCAG G C AG AAGT AT 38 50 
GCAAAGCATG CATCTCAATT AGTCAGCAAC CAGGTGTGGA AAGTCCCCAG 3 900 
GCTCCCCAGC AGGCAGAAGT ATGCAAAGCA TGCATCTCAA TTAGTCAG CA 3 950 
ACCATAGTCC CGCCCCTAAC TCCGCCCATC CCGCCCCTAA CTCCGCCCAG 4 00 0 
TTCCGCCCAT TCTCCGCCCC ATGGCTGACT AATTTTTTTT ATTTATGCAG 4 05 0 
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AGGCCGAGGC CGCCTCGGCC T C TG AG C TAT TCCAGAAGTA GTGAGGAGGC 4100 
TTTTTTGGAG GCCTAGGCTT TTGCAAAAAG CTGTTAACAG CTTGGCACTG 415 0 

5 

GCCGTCGTTT TACAACGTCG TGACTGGGAA AACCCTGGCG TTACCCAACT 4 2 00 
10 TAATCGCCTT GCAGCACATC CCCCCTTCGC CAGCTGGCGT AATAGCGAAG 42 5 0 
AGGCCCGCAC CGATCGCCCT TCCCAACAGT TGCGTAGCCT GAATGGCGAA 4 3 00 
TGGCGCCTGA TGCGGTATTT TCTCCTTACG CATCTGTGCG GTATTTCACA 4 3 50 
CCGCATACGT CAAAGCAACC ATAGTACGCG CCCTGTAGCG GCGCATTAAG 4 40 0 
CGCGGCGGGT GTGGTGGTTA CGCGCAGCGT GACCGCTACA CTTGCCAGCG 44 5 0 
2 5 CCCTAGCGCC CGCTCCTTTC GCTTTCTTCC CTTCCTTTCT CGCCACGTTC 4 50 0 
GCCGGCTTTC CCCGTCAAGC TCTAAATCGG GGGCTCCCTT TAGGGTTCCG 4 5 50 
ATTTAGTGCT TTACGGCACC TCGACCCCAA AAAACTTGAT TTGGGTGATG 4 6 00 
GTTCACGTAG TGGGCCATCG CCCTGATAGA CGGTTTTTCG CCCTTTGACG 46 50 
TTGGAGTCCA CGTTCTTTAA TAGTGGACTC TTGTTCCAAA CTGGAACAAC 4 70 0 

4 0 ACTCAACCCT ATCTCGGGCT ATTCTTTTGA TTTATAAGGG ATTTTGCCGA 4 7 50 

TTTCGGCCTA TTGGTTAAAA AATGAGCTGA TTTAACAAAA ATTTAACGCG 4 8 00 
AATTTTAACA AAATATTAAC GTTTACAATT TTATGGTGCA CTCTCAGTAC 4 8 50 
AATCTGCTCT GATGCCGCAT AGTTAAGCCA ACTCCGCTAT CGCTACGTGA 4 900 
CTGGGTCATG GCTGCGCCCC GACACCCGCC AACACCCGCT GACGCGCCCT 4 9 50 

5 5 GACGGGCTTG TCTGCTCCCG GCATCCGCTT ACAGACAAGC TGTGACCGTC 5 00 0 

TCCGGGAGCT GCATGTGTCA GAGGTTTTCA CCGTCATCAC CGAAACGCGC 50 5 0 
GAGGCAGTAT TCTTGAAGAC GAAAGGGCCT CGTGATACGC CTATTTTTAT 5100 
AGGTTAATGT CATGATAATA ATGGTTTCTT AGACGTCAGG TGGCACTTTT 5150 
CGGGG AAATG TGCGCGGAAC CCCTATTTGT TTATTTTTCT AAATACATTC 5 2 00 
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AAATATGTAT CCGCTCATGA GACAATAACC CTGATAAATG CTTCAATAAT 52 50 
ATTGAAAAAG GAAGAGTATG AGTATTCAAC ATTTCCGTGT CGCCCTTATT 53 0 0 
CCCTTTTTTG CGG CATTTTG CCTTCCTGTT TTTGCTCACC CAGAAACGCT 53 5 0 
GGTGAAAGTA AAAGATGCTG AAGATCAGTT GGGTGCACGA GTGGGTTACA 54 0 0 
TCGAACTGGA TCTCAACAGC GGTAAGATCC TTGAGAGTTT TCGCCCCGAA 54 5 0 
GAACGTTTTC CAATGATGAG CACTTTTAAA GTTCTGCTAT GTGGCGCGGT 55 0 0 
ATTATCCCGT GATGACGCCG GGCAAGAGCA ACTCGGTCGC CGCATACACT 55 5 0 
ATTCTCAGAA TGACTTGGTT GAGTACTCAC CAGTCACAGA AAAGCATCTT 56 00 
ACGGATGGCA TGACAGTAAG AGAATTATGC AGTGCTGCCA TAACCATGAG 56 5 0 
T G ATAAC ACT GCGGCCAACT TACTTCTGAC AACGATCGGA GGACCGAAGG 5 7 00 
AGCTAACCGC TTTTTTG C AC AACATGGGGG ATCATGTAAC TCGCCTTGAT 57 50 
CGTTGGGAAC CGGAGCTGAA TGAAGCCATA CCAAACGACG AGCGTGACAC 5 8 00 
CACGATGCCA GCAGCAATGG CAACAACGTT GCGCAAACTA TTAACTGGCG 58 5 0 
AACTACTTAC TCTAGCTTCC CGGCAACAAT TAATAGACTG GATGGAGGCG 5 900 
GATAAAGTTG CAGGACCACT TCTGCGCTCG GCCCTTCCGG CTGGCTGGTT 5 9 50 
TATTGCTGAT AAATCTGGAG CCGGTGAGCG TGGGTCTCGC GGTATCATTG 6 0 00 
CAGCACTGGG GCCAGATGGT AAGCCCTCCC GTATCGTAGT TATCTACACG 6 050 
ACGGGGAGTC AGGCAACTAT GGATGAACGA AATAGACAGA TCGCTGAGAT 6100 
AGGTGCCTCA CTGATTAAGC ATTGGTAACT GTCAGACCAA GTTTACTCAT 6150 
ATATACTTTA GATTGATTTA AAACTTCATT TTTAATTTAA AAGGATCTAG 62 00 
GTGAAG AT C C TTTTTGATAA TCTCATGACC AAAATCCCTT AACGTGAGTT 6 2 50 
TTCGTTCCAC TGAGCGTCAG ACCCCGTAGA AAAG AT C AAA GGATCTTCTT 6 3 00 
GAGATCCTTT TTTTCTGCGC GTAATCTGCT GCTTGCAAAC AAAAAAACCA 6 3 50 
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CCGCTACCAG CGGTGGTTTG TTTGCCGGAT CAAGAGCTAC CAACTCTTTT 64 00 
TCCGAAGGTA ACTGGCTTCA GCAGAGCGCA GAT AC CAAAT ACTGTCCTTC 6450 

5 

TAGTGTAGCC GTAGTTAGGC CACCACTTCA AGAACTCTGT AGCACCGCCT 6 50 0 
10 ACATACCTCG CTCTG CTAAT CCTGTTACCA GTGGCTGCTG CCAGTGGCGA 6 5 50 
TAAGTCGTGT CTTACCGGGT TGGACTCAAG ACGATAGTTA CCGGATAAGG 66 00 

15 

CGCAGCGGTC GGGCTGAACG GGGGGTTCGT GCACACAGCC CAGCTTGGAG 66 50 
CGAACGACCT ACACCGAACT GAGATACCTA CAGCGTGAGC ATTGAGAAAG 6 700 

20 

CGCCACGCTT CCCGAAGGGA GAAAGGCGGA CAGGTATCCG GTAAGCGGCA 6 7 50 
2 5 GGGTCGGAAC AGGAGAGCGC ACGAGGGAGC TTCCAGGGGG AAACGCCTGG 6 8 00 
TATCTTTATA GTCCTGTCGG GTTTCGCCAC CTCTGACTTG AGCGTCGATT 6850 

30 

TTTGTGATGC TCGTCAGGGG GGCGGAGCCT ATGGAAAAAC GCCAGCAACG 6 90 0 
CGGCCTTTTT ACGGTTCCTG GCCTTTTGCT GGCCTTTTGC TCACATGTTC 6 950 

35 

TTTCCTGCGT TATCCCCTGA TTCTGTGGAT AACCGTATTA CCGCCTTTGA 7000 

4 0 GTGAGCTGAT ACCGCTCGCC GCAGCCGAAC GACCGAGCGC AG CGAGTCAG 70 50 

TGAGCGAGGA AG CGG AAG AG CGCCCAATAC GCAAACCGCC TCTCCCCGCG 7100 

45 

CGTTGGCCGA TTCATTAATC CAGCTGGCAC GACAGGTTTC CCGACTGGAA 7150 
AGCGGGCAGT GAGCGCAACG CAATTAATGT GAG TT AC CT C ACTCATTAGG 72 0 0 

50 

CACCCCAGGC TTTACACTTT ATGCTTCCGG CTCGTATGTT GTG TGGAATT 72 50 

5 5 GTG AG CGGAT AACAATTTCA CACAGGAAAC AG CT ATGAC C ATGATTACGA 73 0 0 

ATTAA 73 05 

60 
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CLAIMS 

1. A DNA construct comprising a transcriptional initiation site, a 
transcriptional termination site, a selectable gene, a product gene 

5 provided 3' to the selectable gene, a transcriptional regulatory 

region regulating transcription of both the selectable gene and the 
product gene, the selectable gene being positioned within an intron 
having a splice donor site 5' of the intron, which splice donor site 
regulates expression of the product gene using the transcriptional 
10 regulatory region. 

2. The DNA construct of claim 1 wherein the splice donor site comprises 
an efficient splice donor sequence. 

15 3. The DNA construct of claim 2 wherein the splice donor site comprises 

a consensus splice donor sequence. 

4. The DNA construct of claim 2 wherein the splice donor site comprises 
the sequence GACGTAAGT . 

20 

5. The DNA construct of claim 1 wherein the selectable gene is an 
amplifiable gene. 

6. The DNA construct of claim 5 wherein the amplifiable gene is DHFR . 

7. The DNA construct of claim 1 wherein the transcriptional regulatory 
region comprises a promoter and an enhancer. 

8. A vector comprising the DNA construct of claim 1. 

9. The vector of claim 8 wherein the selectable gene of the DNA 
construct is an ampl if iable gene . 

10. The vector of claim 8 that is capable of replication in a eukaryotic 
35 host. 

11. A eukaryotic host cell comprising the vector of claim 10. 

12. A eukaryotic host cell comprising the DNA construct of claim 5. 

40 

13. The host cell of claim 11 wherein the vector is introduced into the 
host cell by electroporat ion . 

14. A eukaryotic host cell comprising the DNA construct of claim 1 
45 integrated into a chromosome of the host cell. 



25 
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15. The host cell of claim 14 that is a mammalian cell. 

16 . A method for producing a product of interest comprising culturing the 
host cell of claim ll so as to express the product gene and 
5 recovering the product from the host cell culture. 

17. The method of claim 16 further comprising recovering the product from 
the culture medium. 

10 18. The method of claim 16 wherein the selectable gene is an amplifiable 
gene and the splice donor site comprises an efficient splice donor 
sequence . 

19. A method for producing a product of interest comprising culturing the 
15 host cell of claim 12 so as to express the product gene in a 

selective medium comprising an amplifying agent for sufficient time 
to allow amplification to occur, and recovering the product. 

20. A method for producing eukaryotic cells having multiple copies of a 
20 product gene comprising transforming eukaryotic cells with the DNA 

construct of claim 5, growing the cells in a selective medium 
comprising an amplifying agent for a sufficient time for 
amplification to occur, and selecting cells having multiple copies 
of the product gene . 



25 



21. The method of claim 20 further comprising recovering from the 
selected cells the product of interest. 
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